Julia Böhnke, Antonia Zapf, Katharina Kramer, Philipp Weber, Louisa Bode, Marcel Mast, Antje Wulff, Michael Marschollek, Sven Schamer, Henning Rathert, Thomas Jack, Philipp Beerbaum, Nicole Rübsamen, Julia Böhnke, André Karch, Pronaya Prosun Das, Lena Wiese, Christian Groszweski-Anders, Andreas Haller, Torsten Frank, André Karch, Nicole Rübsamen
- Objectives
In this study, we evaluate how to estimate diagnostic test accuracy (DTA) correctly in the presence of longitudinal patient data (ie, repeated test applications per patient).
Study Design and Setting
We used a nonparametric approach to estimate the sensitivity and specificity of three tests for different target conditions with varying characteristics (ie, episode length and disease-free intervals between episodes): 1) systemic inflammatory response syndrome (n = 36), 2) depression (n = 33), and 3) epilepsy (n = 30). DTA was estimated on the levels ‘time’, ‘block’, and ‘patient-time’ for each diagnosis, representing different research questions. The estimation was conducted for the time units per minute, per hour, and per day.
Results
A comparison of DTA per and across use cases showed variations in the estimates, which resulted from the used level, the time unit, the resulting number of observations per patient, and the diagnosis-specific characteristics. Intra- andObjectives
In this study, we evaluate how to estimate diagnostic test accuracy (DTA) correctly in the presence of longitudinal patient data (ie, repeated test applications per patient).
Study Design and Setting
We used a nonparametric approach to estimate the sensitivity and specificity of three tests for different target conditions with varying characteristics (ie, episode length and disease-free intervals between episodes): 1) systemic inflammatory response syndrome (n = 36), 2) depression (n = 33), and 3) epilepsy (n = 30). DTA was estimated on the levels ‘time’, ‘block’, and ‘patient-time’ for each diagnosis, representing different research questions. The estimation was conducted for the time units per minute, per hour, and per day.
Results
A comparison of DTA per and across use cases showed variations in the estimates, which resulted from the used level, the time unit, the resulting number of observations per patient, and the diagnosis-specific characteristics. Intra- and inter-use-case comparisons showed that the time-level had the highest DTA, particularly the larger the time unit, and that the patient-time-level approximated 50% sensitivity and specificity.
Conclusion
Researchers need to predefine their choices (ie, estimation levels and time units) based on their individual research aims, estimands, and diagnosis-specific characteristics of the target outcomes to make sure that unbiased and clinically relevant measures are communicated. In cases of uncertainty, researchers could report the DTA of the test using more than one estimation level and/or time unit.…

