A deep audiovisual approach for human confidence classification

  • Research on self-efficacy and confidence has spread across several subfields of psychology and neuroscience. The role of one’s confidence is very crucial in the formation of attitude and communication skills. The importance of differentiating the levels of confidence is quite visible in this domain. With the recent advances in extracting behavioral insight from a signal in multiple applications, detecting confidence is found to have great importance. One such prominent application is detecting confidence in interview conversations. We have collected an audiovisual data set of interview conversations with 34 candidates. Every response (from each of the candidate) of this data set is labeled with three levels of confidence: high, medium, and low. Furthermore, we have also developed algorithms to efficiently compute such behavioral confidence from speech and video. A deep learning architecture is proposed for detecting confidence levels (high, medium, and low) from an audiovisual clipResearch on self-efficacy and confidence has spread across several subfields of psychology and neuroscience. The role of one’s confidence is very crucial in the formation of attitude and communication skills. The importance of differentiating the levels of confidence is quite visible in this domain. With the recent advances in extracting behavioral insight from a signal in multiple applications, detecting confidence is found to have great importance. One such prominent application is detecting confidence in interview conversations. We have collected an audiovisual data set of interview conversations with 34 candidates. Every response (from each of the candidate) of this data set is labeled with three levels of confidence: high, medium, and low. Furthermore, we have also developed algorithms to efficiently compute such behavioral confidence from speech and video. A deep learning architecture is proposed for detecting confidence levels (high, medium, and low) from an audiovisual clip recorded during an interview. The achieved unweighted average recall (UAR) reaches 85.9% on audio data and 73.6% on video data captured from an interview session.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Sushovan Chanda, Kedar Fitwe, Gauri Deshpande, Björn W. SchullerORCiDGND, Sachin Patel
URN:urn:nbn:de:bvb:384-opus4-979183
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/97918
ISSN:2624-9898OPAC
Parent Title (English):Frontiers in Computer Science
Publisher:Frontiers Media S.A.
Place of publication:Lausanne
Type:Article
Language:English
Date of first Publication:2021/10/13
Publishing Institution:Universität Augsburg
Release Date:2022/09/16
Tag:confidence classification; Bi-LSTM; audio analysis; video analysis; deep neural network; data collection
Volume:3
First Page:674533
DOI:https://doi.org/10.3389/fcomp.2021.674533
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 000 Informatik, Informationswissenschaft, allgemeine Werke
Licence (German):CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand)