Towards speech-only opinion-level sentiment analysis
- The growing popularity of various forms of Spoken Dialogue Systems (SDS) raises the demand for their capability of implicitly assessing the speaker’s sentiment from speech only. Mapping the latter on user preferences enables to adapt to the user and individualize the requested information while increasing user satisfaction. In this paper, we explore the integration of rank consistent ordinal regression into a speech-only sentiment prediction task performed by ResNet-like systems. Furthermore, we use speaker verification extractors trained on larger datasets as low-level feature extractors. An improvement of performance is shown by fusing sentiment and pre-extracted speaker embeddings reducing the speaker bias of sentiment predictions. Numerous experiments on Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) databases show that we beat the baselines of state-of-the-art unimodal approaches. Using speech as the only modality combined with optimizing an order-sensitiveThe growing popularity of various forms of Spoken Dialogue Systems (SDS) raises the demand for their capability of implicitly assessing the speaker’s sentiment from speech only. Mapping the latter on user preferences enables to adapt to the user and individualize the requested information while increasing user satisfaction. In this paper, we explore the integration of rank consistent ordinal regression into a speech-only sentiment prediction task performed by ResNet-like systems. Furthermore, we use speaker verification extractors trained on larger datasets as low-level feature extractors. An improvement of performance is shown by fusing sentiment and pre-extracted speaker embeddings reducing the speaker bias of sentiment predictions. Numerous experiments on Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) databases show that we beat the baselines of state-of-the-art unimodal approaches. Using speech as the only modality combined with optimizing an order-sensitive objective function gets significantly closer to the sentiment analysis results of state-of-the-art multimodal systems.…
Author: | Annalena AicherORCiDGND, Alisa Gazizullina, Aleksei Gusev, Yuri Matveev, Wolfgang Minker |
---|---|
URN: | urn:nbn:de:bvb:384-opus4-1229635 |
Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/122963 |
URL: | https://aclanthology.org/2022.lrec-1.215/ |
ISBN: | 979-10-95546-72-6OPAC |
Parent Title (English): | Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, 20-25 June 2022, Marseille, France |
Publisher: | European Language Resources Association (ELRA) |
Place of publication: | Paris |
Editor: | Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis |
Type: | Conference Proceeding |
Language: | English |
Date of Publication (online): | 2025/06/21 |
Year of first Publication: | 2022 |
Publishing Institution: | Universität Augsburg |
Release Date: | 2025/06/25 |
First Page: | 2000 |
Last Page: | 2006 |
Institutes: | Fakultät für Angewandte Informatik |
Fakultät für Angewandte Informatik / Institut für Informatik | |
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Menschzentrierte Künstliche Intelligenz | |
Dewey Decimal Classification: | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik |
Licence (German): | ![]() |