A non-invasive speech quality evaluation algorithm for hearing aids with multi-head self-attention and audiogram-based features

Liang, Ruiyu; Xie, Yue; Cheng, Jiaming; Pang, Cong; Schuller, Björn

The speech quality delivered by hearing aids plays a crucial role in determining the acceptance and satisfaction of users. Compared with invasive speech quality evaluation methods that require pure signals as a reference, this paper proposes a non-invasive speech quality evaluation algorithm for hearing aids with multi-head self-attention and audiogram-based features. Initially, the audiogram of hearing-impaired individuals is extended along the frequency axis, enabling the speech quality evaluation model to learn the gain requirements specific to frequency bands for hearing-impaired individuals. Subsequently, the spectrogram is extracted from the speech signals to be evaluated. These features are combined with the transformed audiogram to create input features. To extract deep frame-level feature, a network employing multiple two-dimensional convolutional modules is utilized. Then, the temporal features are modeled using bidirectional long short-term memory networks (BiLSTM), while aThe speech quality delivered by hearing aids plays a crucial role in determining the acceptance and satisfaction of users. Compared with invasive speech quality evaluation methods that require pure signals as a reference, this paper proposes a non-invasive speech quality evaluation algorithm for hearing aids with multi-head self-attention and audiogram-based features. Initially, the audiogram of hearing-impaired individuals is extended along the frequency axis, enabling the speech quality evaluation model to learn the gain requirements specific to frequency bands for hearing-impaired individuals. Subsequently, the spectrogram is extracted from the speech signals to be evaluated. These features are combined with the transformed audiogram to create input features. To extract deep frame-level feature, a network employing multiple two-dimensional convolutional modules is utilized. Then, the temporal features are modeled using bidirectional long short-term memory networks (BiLSTM), while a multi-head self-attention mechanism is employed to integrate contextual information. This mechanism enables the model to focus on key frame information. Experimental results demonstrate that, compared to currently available advanced algorithms, the proposed network exhibits a higher correlation with the Hearing Aid Speech Quality Index (HASQI) and demonstrates robustness under various noise conditions.… show more

Author:	Ruiyu Liang, Yue Xie, Jiaming Cheng, Cong Pang, Björn Schuller ORCiD GND
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/115078
ISSN:	2329-9290OPAC
ISSN:	2329-9304OPAC
Parent Title (English):	IEEE/ACM Transactions on Audio, Speech, and Language Processing
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Place of publication:	New York, NY
Type:	Article
Language:	English
Year of first Publication:	2024
Release Date:	2024/09/02
Volume:	32
First Page:	2166
Last Page:	2176
DOI:	https://doi.org/10.1109/taslp.2024.3378107
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing
	Nachhaltigkeitsziele
	Nachhaltigkeitsziele / Ziel 3 - Gesundheit und Wohlergehen

Open Access

A non-invasive speech quality evaluation algorithm for hearing aids with multi-head self-attention and audiogram-based features

Export metadata

Statistics

Additional Services