Cross-layer similarity knowledge distillation for speech enhancement

Speech enhancement (SE) algorithms based on deep neural networks (DNNs) often encounter challenges of limited hardware resources or strict latency requirements when deployed in real-world scenarios. However, a strong enhancement effect typically requires a large DNN. In this paper, a knowledge distillation framework for SE is proposed to compress the DNN model. We study the strategy of cross-layer connection paths, which fuses multi-level information from the teacher and transfers it to the student. To adapt to the SE task, we propose a frame-level similarity distillation loss. We apply this method to the deep complex convolution recurrent network (DCCRN) and make targeted adjustments. Experimental results show that the proposed method considerably improves the enhancement effect of the compressed DNN and outperforms other distillation methods.

Metadaten
Author:	Jiaming Cheng, Ruiyu Liang, Yue Xie, Li Zhao, Björn Schuller ORCiD GND, Jie Jia, Yiyuan Peng
URN:	urn:nbn:de:bvb:384-opus4-992916
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/99291
Parent Title (English):	Interspeech 2022, Incheon, Korea, 18-22 September 2022
Publisher:	ISCA
Place of publication:	Baixas
Editor:	Hanseok Ko, John H. L. Hansen
Type:	Conference Proceeding
Language:	English
Year of first Publication:	2022
Publishing Institution:	Universität Augsburg
Release Date:	2022/11/15
First Page:	926
Last Page:	930
DOI:	https://doi.org/10.21437/interspeech.2022-429
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	Deutsches Urheberrecht

Open Access