Modelling affective states for transportation systems

  • This thesis focuses on deep learning for automatic estimation of emotional states. Computational emotion recognition, which is part of the young and rapidly growing discipline of affective computing, has been applied to a large variety of fields in recent years, including education, customer services, and digital health and wellbeing. It has the potential to elevate human-machine interaction, by enabling systems to sense a user’s emotions and adjust responses accordingly, for a more natural and intuitive experience. This is especially relevant given the increasing number and complexity of automated systems that we interact with in our daily lives. However, challenges still remain for deploying such technology "in the wild" i. e., under realistic conditions. The nature of emotions is yet not fully understood, and their expression and perception is nuanced, diverse, and strongly dependent on context e. g., cultural setting. Deep-learning based models require large amounts of data forThis thesis focuses on deep learning for automatic estimation of emotional states. Computational emotion recognition, which is part of the young and rapidly growing discipline of affective computing, has been applied to a large variety of fields in recent years, including education, customer services, and digital health and wellbeing. It has the potential to elevate human-machine interaction, by enabling systems to sense a user’s emotions and adjust responses accordingly, for a more natural and intuitive experience. This is especially relevant given the increasing number and complexity of automated systems that we interact with in our daily lives. However, challenges still remain for deploying such technology "in the wild" i. e., under realistic conditions. The nature of emotions is yet not fully understood, and their expression and perception is nuanced, diverse, and strongly dependent on context e. g., cultural setting. Deep-learning based models require large amounts of data for training, but in the absence of an objective emotional ground truth, gathering annotations from humans is a labour-intensive and costly process. Furthermore, data collected in a natural setting is noisy, with varying environmental conditions and recording quality. Accessing multiple complementary signals e. g., vision and audio can help in this situation. Thus, the following thesis examines the multi-modal prediction of emotional states in the wild, from clues in the face and voice. Emotions are modelled as value-continuous variables, in particular valence and arousal. Their fluctuation across time is captured through sequence-based processing. To summarise the thesis, background of emotions theory and signal processing is presented first, followed by methods used for training the models. Experiments on multiple datasets are conducted and discussed, and recommendations for future research given. In terms of core contributions, this work examines different sets of features based on fine-tuned CNNs and Transformers, and their multi-modal fusion on challenging, noisy datasets. It analyses in detail the problem of emotion recognition on non-verbal vocal bursts such as laughter or crying, which are less commonly studied than speech. It addresses the issue of cross-cultural emotion recognition via domain adaptation on a multi-cultural dataset, and demonstrates that unlabelled data can be leveraged to boost recognition performance.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Vincent KarasORCiD
URN:urn:nbn:de:bvb:384-opus4-1195435
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/119543
Advisor:Björn Schuller
Type:Doctoral Thesis
Language:English
Year of first Publication:2025
Publishing Institution:Universität Augsburg
Granting Institution:Universität Augsburg, Fakultät für Angewandte Informatik
Date of final exam:2025/02/20
Release Date:2025/04/14
Tag:affective computing; deep learning; emotion recognition
GND-Keyword:Deep Learning; Mensch-Maschine-Kommunikation; Gefühl
Pagenumber:xxvii, 126
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):Deutsches Urheberrecht