- Context: Pediatric differentiated thyroid carcinoma (DTC) often presents with advanced disease but generally has excellent long-term survival. However, recurrence or failure to achieve remission remains relatively frequent, underscoring the need for improved early risk stratification.
Objective: To develop and evaluate an interpretable machine learning model for predicting recurrence or non-remission in pediatric DTC using routine clinical and biochemical variables.
Design and setting: Retrospective analysis of 250 pediatric patients (aged <18 years) enrolled in the (GPOH-)MET Registry (1997-2023). Inclusion required known age at diagnosis and ≥24 months of follow-up. The composite study endpoint was structural recurrence or failure to achieve remission within 24 months of initial therapy.
Methods: An XGBoost classifier was trained on 80% of the data, with the remaining 20% used as an independent test set. Model generalizability was assessed via 50 randomized stratifiedContext: Pediatric differentiated thyroid carcinoma (DTC) often presents with advanced disease but generally has excellent long-term survival. However, recurrence or failure to achieve remission remains relatively frequent, underscoring the need for improved early risk stratification.
Objective: To develop and evaluate an interpretable machine learning model for predicting recurrence or non-remission in pediatric DTC using routine clinical and biochemical variables.
Design and setting: Retrospective analysis of 250 pediatric patients (aged <18 years) enrolled in the (GPOH-)MET Registry (1997-2023). Inclusion required known age at diagnosis and ≥24 months of follow-up. The composite study endpoint was structural recurrence or failure to achieve remission within 24 months of initial therapy.
Methods: An XGBoost classifier was trained on 80% of the data, with the remaining 20% used as an independent test set. Model generalizability was assessed via 50 randomized stratified train-validation splits of the training dataset. SHapley Additive exPlanations (SHAP) were used to interpret feature contributions.
Results: The final model achieved an AUROC of 0.86 on the independent test set. Across 50 validation splits, the mean AUROC was 0.82 (SD ± 0.05), sensitivity 0.81 (SD ± 0.09), and specificity 0.64 (SD ± 0.06). SHAP analysis identified younger age at diagnosis (<10 years), elevated postoperative thyroglobulin levels, and distant metastases as the most influential predictors.
Conclusions: This interpretable machine learning model reliably predicts early recurrence or non-remission in pediatric DTC and may complement current risk stratification systems to support personalized, risk-adapted treatment decisions.…

