Refine
Year of publication
Keywords
- General Medicine (2)
- Advanced and Specialised Nursing (1)
- Applied Mathematics (1)
- Big Data (1)
- Biochemistry (1)
- Biological Psychiatry (1)
- Cancer Research (1)
- Computational Mathematics (1)
- Computer Science Applications (1)
- Epidemiology (1)
Institute
- Medizinische Fakultät (14)
- Universitätsklinikum (9)
- Fakultät für Angewandte Informatik (5)
- Institut für Informatik (5)
- Institut für Software & Systems Engineering (4)
- Lehrstuhl für Softwaretechnik (4)
- Professur Softwaremethodik für verteilte Systeme (4)
- Lehrstuhl für Psychiatrie und Psychotherapie (3)
- Lehrstuhl für Neurologie (2)
- Lehrstuhl für Neurorehabilitation (2)
Data Integration for Future Medicine (DIFUTURE): An Architectural and Methodological Overview
(2018)
Background
The PRIDE trial (NOA-28; ARO-2024-01; AG-NRO-06; NCT05871021) is designed to determine whether a dose escalation with 75.0 Gy in 30 fractions can enhance the median overall survival (OS) in patients with methylguanine methyltransferase (MGMT) promotor unmethylated glioblastoma compared to historical median OS rates, while being isotoxic to historical cohorts through the addition of concurrent bevacizumab (BEV). To ensure protocol-compliant irradiation planning with all study centers, a dummy run was planned and the plan quality was evaluated.
Methods
A suitable patient case was selected and the computed tomography (CT), magnetic resonance imaging (MRI) and O-(2-[18F]fluoroethyl)-L-tyrosine (FET) positron emission tomography (PET) contours were made available. Participants at the various intended study sites performed radiation planning according to the PRIDE clinical trial protocol. The treatment plans and dose grids were uploaded as Digital Imaging and Communications in Medicine (DICOM) files to a cloud-based platform. Plan quality and protocol adherence were analyzed using a standardized checklist, scorecards and indices such as Dice Score (DSC) and Hausdorff Distance (HD).
Results
Median DSC was 0.89, 0.90, 0.88 for PTV60, PTV60ex (planning target volume receiving 60.0 Gy for the standard and the experimental plan, respectively) and PTV75 (PTV receiving 75.0 Gy in the experimental plan), respectively. Median HD values were 17.0 mm, 13.9 mm and 12.1 mm, respectively. These differences were also evident in the volumes: The PTV60 had a volume range of 219.1–391.3 cc (median: 261.9 cc) for the standard plans, while the PTV75 volumes for the experimental plans ranged from 71.5–142.7 cc (median: 92.3 cc). The structures with the largest deviations in Dice score were the pituitary gland (median 0.37, range 0.00–0.69) and the right lacrimal gland (median 0.59, range 0.42–0.78).
Conclusions
The deviations revealed the necessity of systematic trainings with appropriate feedback before the start of clinical trials in radiation oncology and the constant monitoring of protocol compliance throw-out the study.
Designing an ML auditing criteria catalog as starting point for the development of a framework
(2024)
Introduction
In Multiple Sclerosis (MS), patients´ characteristics and (bio)markers that reliably predict the individual disease prognosis at disease onset are lacking. Cohort studies allow a close follow-up of MS histories and a thorough phenotyping of patients. Therefore, a multicenter cohort study was initiated to implement a wide spectrum of data and (bio)markers in newly diagnosed patients.
Methods
ProVal-MS (Prospective study to validate a multidimensional decision score that predicts treatment outcome at 24 months in untreated patients with clinically isolated syndrome or early Relapsing–Remitting-MS) is a prospective cohort study in patients with clinically isolated syndrome (CIS) or Relapsing–Remitting (RR)-MS (McDonald 2017 criteria), diagnosed within the last two years, conducted at five academic centers in Southern Germany. The collection of clinical, laboratory, imaging, and paraclinical data as well as biosamples is harmonized across centers. The primary goal is to validate (discrimination and calibration) the previously published DIFUTURE MS-Treatment Decision score (MS-TDS). The score supports clinical decision-making regarding the options of early (within 6 months after study baseline) platform medication (Interferon beta, glatiramer acetate, dimethyl/diroximel fumarate, teriflunomide), or no immediate treatment (> 6 months after baseline) of patients with early RR-MS and CIS by predicting the probability of new or enlarging lesions in cerebral magnetic resonance images (MRIs) between 6 and 24 months. Further objectives are refining the MS-TDS score and providing data to identify new markers reflecting disease course and severity. The project also provides a technical evaluation of the ProVal-MS cohort within the IT-infrastructure of the DIFUTURE consortium (Data Integration for Future Medicine) and assesses the efficacy of the data sharing techniques developed.
Perspective
Clinical cohorts provide the infrastructure to discover and to validate relevant disease-specific findings. A successful validation of the MS-TDS will add a new clinical decision tool to the armamentarium of practicing MS neurologists from which newly diagnosed MS patients may take advantage.
Background:
Multiple sclerosis (MS) is a chronic neuroinflammatory disease affecting about 2.8 million people worldwide. Disease course after the most common diagnoses of relapsing-remitting multiple sclerosis (RRMS) and clinically isolated syndrome (CIS) is highly variable and cannot be reliably predicted. This impairs early personalized treatment decisions.
Objectives:
The main objective of this study was to algorithmically support clinical decision-making regarding the options of early platform medication or no immediate treatment of patients with early RRMS and CIS.
Design:
Retrospective monocentric cohort study within the Data Integration for Future Medicine (DIFUTURE) Consortium.
Methods:
Multiple data sources of routine clinical, imaging and laboratory data derived from a large and deeply characterized cohort of patients with MS were integrated to conduct a retrospective study to create and internally validate a treatment decision score [Multiple Sclerosis Treatment Decision Score (MS-TDS)] through model-based random forests (RFs). The MS-TDS predicts the probability of no new or enlarging lesions in cerebral magnetic resonance images (cMRIs) between 6 and 24 months after the first cMRI.
Results:
Data from 65 predictors collected for 475 patients between 2008 and 2017 were included. No medication and platform medication were administered to 277 (58.3%) and 198 (41.7%) patients. The MS-TDS predicted individual outcomes with a cross-validated area under the receiver operating characteristics curve (AUROC) of 0.624. The respective RF prediction model provides patient-specific MS-TDS and probabilities of treatment success. The latter may increase by 5–20% for half of the patients if the treatment considered superior by the MS-TDS is used.
Conclusion:
Routine clinical data from multiple sources can be successfully integrated to build prediction models to support treatment decision-making. In this study, the resulting MS-TDS estimates individualized treatment success probabilities that can identify patients who benefit from early platform medication. External validation of the MS-TDS is required, and a prospective study is currently being conducted. In addition, the clinical relevance of the MS-TDS needs to be established.
Background: On the way towards a commonly agreed framework for auditing ML algorithms, in our previous paper we proposed a 30-question core criteria catalog. In this paper, we apply our catalog to an early sepsis onset detection system use case. Methods: The assessment of the ML algorithm behind the sepsis prediction system takes place in a kind of external audit. We apply the questions of our catalog with described context to the available sepsis project resources made publicly available. For the audit process we considered three steps proposed by the Supreme Audit Institutions of Finland et al. and utilized inter-rater reliability techniques. We also conducted an extensive reproduction study, as being encouraged by our catalog, including data perturbation experiments. Results: We were able to successfully apply our 30-question catalog to the sepsis ML algorithm development project. 37% of the questions were rated as fully addressed, 33% of the questions as partially addressed and 30% of the questions as not addressed, based on the first auditor. The weighted Cohen’s kappa agreement coefficient results in κ=0.51 . The focus of the sepsis project is on algorithm design, data properties and assessment metrics. In our reproduction study, using externally validated pooled prediction on the self-attention deep learning model, we achieved an AUC of 0.717 (95% CI, 0.693-0.740) and a PPV of 28.3 (95% CI, 24.5-32.0) at 80% TPR and 18.8% sepsis-case prevalence harmonization. For the lead time to sepsis onset, we could not reproduce meaningful values. In the perturbation experiment, the model showed an AUC of 0.799 (95% CI, 0.756-0.843) with modified input data in contrast to an AUC of 0.788 (95% CI, 0.743-0.833) with original input data, when trained on the AUMC dataset and validated externally. Discussion: The catalog application results are visualized in a radar diagram, allowing an auditor to quickly assess and compare strengths and weaknesses of ML algorithm development or implementation projects. In general, we were able to reproduce the magnitude of the sepsis project’s reported performance metrics. However, certain steps of the reproduction study proved to be challenging due to necessary code changes and dependencies on package versions and the runtime environment. The extent of the deviation in the result metrics was −5.83% for the AUC and −11.03% for the PPV, presumably explained by our absence of tuning. The AUC change of 1.45% indicates resilience of the self-attention deep learning model to input data manipulation. An algorithmic error is most likely responsible for the missing lead time to sepsis onset metric. Even though the acquired weighted Cohen’s kapa coefficient is interpreted as having a “fair to good” agreement between both auditors, there exists potential subjectivity showing room for improvement. This could be mitigated if more groups (multiple auditors) would apply our catalog to existing ML development and implementation projects. A subsequent “catalog application guideline” could be established this way. Our activities might also help development or implementation teams to prepare themselves for future, legally required audits of their newly created ML algorithms/AI products.