Zero-shot personalization of speech foundation models for depressed mood monitoring

Gerczuk, Maurice; Triantafyllopoulos, Andreas; Amiriparian, Shahin; Kathan, Alexander; Bauer, Jonathan; Berking, Matthias; Schuller, Björn W.

Zero-shot personalization of speech foundation models for depressed mood monitoring

Maurice Gerczuk, Andreas Triantafyllopoulos, Shahin Amiriparian, Alexander Kathan, Jonathan Bauer, Matthias Berking, Björn W. Schuller

The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual’s diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines.The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual’s diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness.…

Metadaten
Author:	Maurice Gerczuk ORCiD, Andreas Triantafyllopoulos ORCiD, Shahin Amiriparian ORCiD GND, Alexander Kathan ORCiD, Jonathan Bauer, Matthias Berking, Björn W. Schuller ORCiD GND
URN:	urn:nbn:de:bvb:384-opus4-1089748
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/108974
ISSN:	2666-3899OPAC
Parent Title (English):	Patterns
Publisher:	Elsevier BV
Type:	Article
Language:	English
Date of first Publication:	2023/11/01
Publishing Institution:	Universität Augsburg
Release Date:	2023/11/09
Tag:	General Decision Sciences
Volume:	4
Issue:	11
First Page:	100873
DOI:	https://doi.org/10.1016/j.patter.2023.100873
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand)

Open Access

Zero-shot personalization of speech foundation models for depressed mood monitoring

Download full text files

Export metadata

Statistics

Print On Demand

Additional Services