The role of active learning in developing trustworthy AI - Approaches for enhancing transparency and explainability in processes and systems
- The significance of Artificial Intelligence (AI) is continuously growing, influencing sectors like healthcare, finance, and other safety-critical domains. As AI systems become more integrated into society, the demand for functionally intelligent and ethically sound systems rises, making trustworthy AI a societal imperative. In 2019, the European Commission released a framework for trustworthy AI, defining ethical principles and key requirements.
AI systems rely on machine learning (ML) algorithms that analyze data to solve complex problems. Many use complex models like deep neural networks, which challenge explainability and transparency demands. Additionally, training these models often requires extensive labeled data, which can be costly and labor-intensive, especially in high-risk domains. Active learning (AL) emerges as a promising approach, involving human experts in labeling data during the training process.
However, AL-based AI development faces challenges due to theThe significance of Artificial Intelligence (AI) is continuously growing, influencing sectors like healthcare, finance, and other safety-critical domains. As AI systems become more integrated into society, the demand for functionally intelligent and ethically sound systems rises, making trustworthy AI a societal imperative. In 2019, the European Commission released a framework for trustworthy AI, defining ethical principles and key requirements.
AI systems rely on machine learning (ML) algorithms that analyze data to solve complex problems. Many use complex models like deep neural networks, which challenge explainability and transparency demands. Additionally, training these models often requires extensive labeled data, which can be costly and labor-intensive, especially in high-risk domains. Active learning (AL) emerges as a promising approach, involving human experts in labeling data during the training process.
However, AL-based AI development faces challenges due to the dynamic data landscape and high resource demands. Diverging approaches between software engineers and data scientists necessitate revising traditional process models to accommodate continuous model retraining and data variability. Harmonizing these methodologies is critical for efficient collaboration.
This dissertation proposes a revised life cycle model that integrates agile principles with AL project requirements. It enhances development efficiency and transparency by structuring workflows from planning to implementation and fostering collaboration within diverse teams. Central to this approach is an innovative methodology emphasizing principles for data, code, and automation. It provides a workflow that guides teams in establishing robust development processes while adhering to best practices.
Traceability is another critical factor for trustworthy AI, extending from data sources to model outputs, including annotations and predictions. Traditional frameworks often lack sufficient traceability mechanisms, highlighting the need for a comprehensive framework that integrates AL functionality with data provenance and artifact versioning. Such a framework would improve reproducibility and accountability.
To address these needs, this dissertation introduces LIFEDATA, an open-source framework that supports end-to-end traceability and efficient data annotation in AL projects. LIFEDATA integrates components for artifact versioning, user interaction logging, and data provenance, enhancing reproducibility. Its applicability is demonstrated through use cases in the life sciences, including skin image analysis and ECG signal classification. These examples showcase how AL enhances annotation efficiency and model quality.
Explainability is equally crucial for achieving transparency in AI systems. As models grow more complex, making them understandable to humans becomes challenging. eXplainable AI (XAI) plays a central role in addressing this issue. An integrative approach is needed, employing interpretable ML (IML) methods tailored to diverse stakeholder needs.
In this context, a domain-specific approach for skin image analysis is presented. It combines a human-centered classification method for skin lesions with AI interpretation techniques to provide human-understandable explanations. Additionally, the XAI-Compass is introduced as a tool for involving stakeholders in the development process. This concept systematically organizes roles, life cycle phases, and goals to align AI explanations with stakeholder needs. Studies conducted on ECG signal classification further explore how tailored explanations improve understanding and acceptance across different user groups.
In conclusion, the proposed life cycle model and methodology for AL projects significantly enhance transparency and collaboration in trustworthy AI development. The LIFEDATA framework, with its modular structure and traceability focus, offers a practical solution for implementing AL in various domains, as demonstrated in the life sciences. Furthermore, the domain-specific approach to explainability and the integration of stakeholder perspectives via the XAI-Compass improve the interpretability and relevance of AI outputs for diverse users.
Overall, this dissertation presents innovative solutions to challenges in developing trustworthy AI systems, emphasizing transparency, traceability, and explainability.…