Towards automated annotation of infant-caregiver engagement phases with multimodal foundation models

Withanage Don, Daksitha Senel; Schiller, Dominik; Hallmen, Tobias; Mertes, Silvan; Baur, Tobias; Lingenfelser, Florian; Müller, Mitho; Kaubisch, Lea; Reck, Corinna; André, Elisabeth

Towards automated annotation of infant-caregiver engagement phases with multimodal foundation models

Daksitha Senel Withanage Don, Dominik Schiller, Tobias Hallmen, Silvan Mertes, Tobias Baur, Florian Lingenfelser, Mitho Müller, Lea Kaubisch, Corinna Reck, Elisabeth André

Caregiver mental health disorders increase the risk of insecure infant attachment and can negatively impact multiple aspects of child development, including cognitive, emotional, and social growth. Infant-caregiver interactions contain subtle psychological and behavioral cues that reveal these adverse effects, underscoring the need for analytical methods to assess them effectively. The Face-to-Face-Still-Face (FFSF) paradigm is a key approach in psychological research for investigating these dynamics, and the Infant and Caregiver Engagement Phases revised German edition (ICEP-R) annotation scheme provides a structured framework for evaluating FFSF interactions. However, manual annotation is labor-intensive and limits scalability, thus hindering a deeper understanding of early developmental impairments. To address this, we developed a computational method that automates the annotation of caregiver-infant interactions using features extracted from audio-visual foundational models. OurCaregiver mental health disorders increase the risk of insecure infant attachment and can negatively impact multiple aspects of child development, including cognitive, emotional, and social growth. Infant-caregiver interactions contain subtle psychological and behavioral cues that reveal these adverse effects, underscoring the need for analytical methods to assess them effectively. The Face-to-Face-Still-Face (FFSF) paradigm is a key approach in psychological research for investigating these dynamics, and the Infant and Caregiver Engagement Phases revised German edition (ICEP-R) annotation scheme provides a structured framework for evaluating FFSF interactions. However, manual annotation is labor-intensive and limits scalability, thus hindering a deeper understanding of early developmental impairments. To address this, we developed a computational method that automates the annotation of caregiver-infant interactions using features extracted from audio-visual foundational models. Our approach was tested on 92 FFSF video sessions. Findings demonstrate that models based on bidirectional LSTM and linear classifiers show varying effectiveness depending on the role and feature modality. Specifically, bidirectional LSTM models generally perform better in predicting complex infant engagement phases across multimodal features, while linear models show competitive performance, particularly with unimodal feature encodings like Wav2Vec2-BERT. To support further research, we share our raw feature dataset annotated with ICEP-R labels, enabling broader refinement of computational methods in this area.…

Metadaten
Author:	Daksitha Senel Withanage Don, Dominik Schiller GND, Tobias Hallmen ORCiD GND, Silvan Mertes ORCiD GND, Tobias Baur ORCiD GND, Florian Lingenfelser GND, Mitho Müller, Lea Kaubisch, Corinna Reck, Elisabeth André ORCiD GND
URN:	urn:nbn:de:bvb:384-opus4-1166079
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/116607
ISBN:	979-8-4007-0462-8OPAC
Parent Title (English):	ICMI '24: International Conference on Multimodel Interaction, San Jose, Costa Rica, November 4-8, 2024
Publisher:	ACM
Place of publication:	New York, NY
Type:	Conference Proceeding
Language:	English
Year of first Publication:	2024
Publishing Institution:	Universität Augsburg
Release Date:	2024/11/15
First Page:	428
Last Page:	438
DOI:	https://doi.org/10.1145/3678957.3685704
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Menschzentrierte Künstliche Intelligenz
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand)

Open Access

Towards automated annotation of infant-caregiver engagement phases with multimodal foundation models

Download full text files

Export metadata

Statistics

Print On Demand

Additional Services