- Background
Machine learning models to predict hypoxia in patients could improve timely interventions. Due to the diversity and limited generalizability of approaches, external validation is required.
Objective
This study aimed to validate the generalizability of SpO2 Waveform ICU Forecasting Technique (SWIFT), an LSTM algorithm for predicting SpO2 5 and 30 min in advance, on two external datasets.
Methods
We trained the SWIFT model on eICU Collaborative Research Database (eICU-CRD) and validated it on Medical Information Mart for Intensive Care IV (MIMIC-IV) and Amsterdam University Medical Centers Database (UMCdb) data. We evaluated SWIFT-5 and SWIFT-30 for ventilated and non-ventilated populations.
Results
The sampling procedure resulted in substantial population size reduction for MIMIC-IV and UMCdb data due to differences in SpO2 measurement frequency. SWIFT performed well on eICU-CRD data but showed reduced performance on MIMIC-IV data, particularly for SWIFT-30. UMCdbBackground
Machine learning models to predict hypoxia in patients could improve timely interventions. Due to the diversity and limited generalizability of approaches, external validation is required.
Objective
This study aimed to validate the generalizability of SpO2 Waveform ICU Forecasting Technique (SWIFT), an LSTM algorithm for predicting SpO2 5 and 30 min in advance, on two external datasets.
Methods
We trained the SWIFT model on eICU Collaborative Research Database (eICU-CRD) and validated it on Medical Information Mart for Intensive Care IV (MIMIC-IV) and Amsterdam University Medical Centers Database (UMCdb) data. We evaluated SWIFT-5 and SWIFT-30 for ventilated and non-ventilated populations.
Results
The sampling procedure resulted in substantial population size reduction for MIMIC-IV and UMCdb data due to differences in SpO2 measurement frequency. SWIFT performed well on eICU-CRD data but showed reduced performance on MIMIC-IV data, particularly for SWIFT-30. UMCdb validation demonstrated promise, with comparable performance to eICU-CRD for ventilated patients. All datasets exhibited high specificity and NPV, critical for gaining trust in alarms in clinical applications.
Conclusions
The study highlights challenges in generalizing prediction models across diverse ICU populations, emphasizing need for external validation. Further research should focus on improving model adaptability and interpretability, considering the practical application in clinical settings.…

