A non-invasive speech quality evaluation algorithm for hearing aids with multi-head self-attention and audiogram-based features

  • The speech quality delivered by hearing aids plays a crucial role in determining the acceptance and satisfaction of users. Compared with invasive speech quality evaluation methods that require pure signals as a reference, this paper proposes a non-invasive speech quality evaluation algorithm for hearing aids with multi-head self-attention and audiogram-based features. Initially, the audiogram of hearing-impaired individuals is extended along the frequency axis, enabling the speech quality evaluation model to learn the gain requirements specific to frequency bands for hearing-impaired individuals. Subsequently, the spectrogram is extracted from the speech signals to be evaluated. These features are combined with the transformed audiogram to create input features. To extract deep frame-level feature, a network employing multiple two-dimensional convolutional modules is utilized. Then, the temporal features are modeled using bidirectional long short-term memory networks (BiLSTM), while aThe speech quality delivered by hearing aids plays a crucial role in determining the acceptance and satisfaction of users. Compared with invasive speech quality evaluation methods that require pure signals as a reference, this paper proposes a non-invasive speech quality evaluation algorithm for hearing aids with multi-head self-attention and audiogram-based features. Initially, the audiogram of hearing-impaired individuals is extended along the frequency axis, enabling the speech quality evaluation model to learn the gain requirements specific to frequency bands for hearing-impaired individuals. Subsequently, the spectrogram is extracted from the speech signals to be evaluated. These features are combined with the transformed audiogram to create input features. To extract deep frame-level feature, a network employing multiple two-dimensional convolutional modules is utilized. Then, the temporal features are modeled using bidirectional long short-term memory networks (BiLSTM), while a multi-head self-attention mechanism is employed to integrate contextual information. This mechanism enables the model to focus on key frame information. Experimental results demonstrate that, compared to currently available advanced algorithms, the proposed network exhibits a higher correlation with the Hearing Aid Speech Quality Index (HASQI) and demonstrates robustness under various noise conditions.show moreshow less

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Ruiyu Liang, Yue Xie, Jiaming Cheng, Cong Pang, Björn SchullerORCiDGND
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/115078
ISSN:2329-9290OPAC
ISSN:2329-9304OPAC
Parent Title (English):IEEE/ACM Transactions on Audio, Speech, and Language Processing
Publisher:Institute of Electrical and Electronics Engineers (IEEE)
Place of publication:New York, NY
Type:Article
Language:English
Year of first Publication:2024
Release Date:2024/09/02
Volume:32
First Page:2166
Last Page:2176
DOI:https://doi.org/10.1109/taslp.2024.3378107
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing
Nachhaltigkeitsziele
Nachhaltigkeitsziele / Ziel 3 - Gesundheit und Wohlergehen