Driver frustration detection from audio and video in the wild

Abdic, Irman; Fridmann, Lex; McDuff, Daniel; Marchi, Erik; Reimer, Bryan; Schuller, Björn

Driver frustration detection from audio and video in the wild

Irman Abdic, Lex Fridmann, Daniel McDuff, Erik Marchi, Bryan Reimer, Björn Schuller

We present a method for detecting driver frustration from both video and audio streams captured during the driver's interaction with an in-vehicle voice-based navigation system. The video is of the driver's face when the machine is speaking, and the audio is of the driver's voice when he or she is speaking. We analyze a dataset of 20 drivers that contains 596 audio epochs (audio clips, with duration from 1 sec to 15 sec) and 615 video epochs (video clips, with duration from 1 sec to 45 sec). The dataset is balanced across 2 age groups, 2 vehicle systems, and both genders. The model was subject-independently trained and tested using 4-fold cross-validation. We achieve an accuracy of 77.4% for detecting frustration from a single audio epoch and 81.2% for detecting frustration from a single video epoch. We then treat the video and audio epochs as a sequence of interactions and use decision fusion to characterize the trade-off between decision time and classification accuracy, whichWe present a method for detecting driver frustration from both video and audio streams captured during the driver's interaction with an in-vehicle voice-based navigation system. The video is of the driver's face when the machine is speaking, and the audio is of the driver's voice when he or she is speaking. We analyze a dataset of 20 drivers that contains 596 audio epochs (audio clips, with duration from 1 sec to 15 sec) and 615 video epochs (video clips, with duration from 1 sec to 45 sec). The dataset is balanced across 2 age groups, 2 vehicle systems, and both genders. The model was subject-independently trained and tested using 4-fold cross-validation. We achieve an accuracy of 77.4% for detecting frustration from a single audio epoch and 81.2% for detecting frustration from a single video epoch. We then treat the video and audio epochs as a sequence of interactions and use decision fusion to characterize the trade-off between decision time and classification accuracy, which improved the prediction accuracy to 88.5% after 9 epochs.…

Metadaten
Author:	Irman Abdic, Lex Fridmann, Daniel McDuff, Erik Marchi, Bryan Reimer, Björn Schuller ORCiD GND
URN:	urn:nbn:de:bvb:384-opus4-1028099
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/102809
URL:	https://www.ijcai.org/Abstract/16/195
ISBN:	9781577357766OPAC
Parent Title (English):	Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9-15 July 2016, volume 3
Publisher:	AAAI Press / International Joint Conferences on Artificial Intelligence
Place of publication:	Palo Alto, CA
Editor:	Subbarao Kambhampati
Type:	Conference Proceeding
Language:	English
Year of first Publication:	2016
Publishing Institution:	Universität Augsburg
Release Date:	2023/03/15
First Page:	1354
Last Page:	1360
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	Deutsches Urheberrecht

Open Access

Driver frustration detection from audio and video in the wild

Download full text files

Export metadata

Statistics

Additional Services