- Early diagnosis of neurodevelopmental disorders in infants relies on accurate analysis of spontaneous movements. Achieving this requires fast and precise pose estimation methods tailored to infant-specific anatomy and motion. This study evaluates several pretrained YOLOv11-pose models for pose estimation in depth video recordings of preterm neonates and infants using the open source babyPose data set database. The fastest model (YOLOv11n-pose) has a inference time of 0.007 seconds. Considering a previously proposed data split without subject-wise separation between training and testing data, the most accurate model (YOLOv11m-pose) has a median root mean squared distance (RMSD) of 2.15. The median Dice Similarity Coefficient (DSC) and Recall (R) of the joints are 0.85 and 0.86, while the median DSC and R of the joint connections are 0.90 and 0.91. Considering a subject-wise separation of training and testing data, the results noticeably degrade, e.g. to a median DSC and R of the jointsEarly diagnosis of neurodevelopmental disorders in infants relies on accurate analysis of spontaneous movements. Achieving this requires fast and precise pose estimation methods tailored to infant-specific anatomy and motion. This study evaluates several pretrained YOLOv11-pose models for pose estimation in depth video recordings of preterm neonates and infants using the open source babyPose data set database. The fastest model (YOLOv11n-pose) has a inference time of 0.007 seconds. Considering a previously proposed data split without subject-wise separation between training and testing data, the most accurate model (YOLOv11m-pose) has a median root mean squared distance (RMSD) of 2.15. The median Dice Similarity Coefficient (DSC) and Recall (R) of the joints are 0.85 and 0.86, while the median DSC and R of the joint connections are 0.90 and 0.91. Considering a subject-wise separation of training and testing data, the results noticeably degrade, e.g. to a median DSC and R of the joints of 0.79 and 0.81, while the median DSC and R of the joint connections are 0.75 and 0.79. The present work demonstrates a fast and, copared to the literature, accurate approach to depth-based pose estimation in preterm neonates and infants paving the way for automated movement analysis as a clinical tool for early detection of developmental impairments. Particularly in semiautomated settings where subject-specific annotations can be provided, the results are convining. Regarding the abilities to generalize, more work is required to improve the results.…

