Deep learning approaches for adaptive audio processing and binary classification in digital health

Liu, Shuo

This thesis explores the use of deep learning technology in the fields of audio processing and digital health from the perspectives of problem modelling, data fitting, data augmentation, neural network design and model architecture selection, training objectives and optimisa- tion strategies, etc. To this end, we based our research on a few representative tasks in these two fields, specifically audio enhancement and tasks associated with Coronavirus dis- ease 2019 (COVID-19), such as the disease diagnosis. First, the two scientific topics are researched independently. The two scientific subjects are first individually explored. Then, with relation to employing such diagnostic models in actual noisy scenarios, we investigate the idea of merging the two themes, namely applying audio enhancement at the front-end to boost a speech-based COVID-19 diagnostic model’s noise robustness. In particular, for audio enhancement, we first concentrate on achieving a variety of front- end processingThis thesis explores the use of deep learning technology in the fields of audio processing and digital health from the perspectives of problem modelling, data fitting, data augmentation, neural network design and model architecture selection, training objectives and optimisa- tion strategies, etc. To this end, we based our research on a few representative tasks in these two fields, specifically audio enhancement and tasks associated with Coronavirus dis- ease 2019 (COVID-19), such as the disease diagnosis. First, the two scientific topics are researched independently. The two scientific subjects are first individually explored. Then, with relation to employing such diagnostic models in actual noisy scenarios, we investigate the idea of merging the two themes, namely applying audio enhancement at the front-end to boost a speech-based COVID-19 diagnostic model’s noise robustness. In particular, for audio enhancement, we first concentrate on achieving a variety of front- end processing goals, such as noise reduction and source separation, through a unified deep learning architecture. The neural network solutions are implemented in our open-source tool of Neural Holistic Audio Enhancement System (N-HANS). Additionally, we offer a joint training method to alleviate the problem of mismatch when coupling the models for audio enhancement and the target application. By doing this, we can eliminate the obtrusive distractions that the enhancement process introduces which degrade the audio quality and show detrimental effect on the target application. The target applications used to assess the effectiveness of our proposed training method include Automatic Speech Recognition (ASR), Speech Commands Recognition (SCR), Speech Emotion Recognition (SER) and Acoustic Scene Classification (ASC). These representative applications span classification and regression problems, English and non-English languages, and speech and non-speech tasks. In regards to the tasks for COVID-19, we first seek for the potential methods of diag- nosing the illness using signals from smart or wearable devices. For this purpose, we extend the spectrum of signal types studied beyond speech, to incorporate, for example, respiratory sounds like coughing and breathing, as well as other bio-signals like heartbeats. Besides, we explore deep learning based solutions to mask-wearing detection from speech, or whether a speaker is wearing or not wearing a mask while speaking to create an automated, real-time, effective, affordable solution to assist in curbing the spread of the COVID-19 virus. Speech is an alternative useful source for COVID-19 detection, however, it is frequently affected by ambient noises like those found in everyday life. We can utilise our audio enhancement methods to improve the audio used as the COVID-19 detection model’s input, facilitating its wider use in practical situations. This is accomplished by using vast amounts of data for the training of our audio enhancement model, which manages to outperform most prior methods while increasing the dependability and robustness of the back-end application. There are certain connections, similarities and commonalities in the research methodol- ogy, specifically the use of deep learning techniques in both research topics presented in this thesis, despite the seeming lack of a direct connection between them. In other words, similar or identical deep learning algorithms are applied to these two research themes. One such example is the deep fusion method implemented in the N-HANS toolkit, which employs an auxiliary network to acquire contextual information, can also be exploited as a data fusion solution used in this work for merging the information from two audio types for COVID-19 diagnosis. Overall, the methodologies suggested in this thesis should not be restricted by the problems under consideration, but rather should be viewed more macroscopically as broad answers to a larger range of applications.… show more

Author:	Shuo Liu ORCiD
URN:	urn:nbn:de:bvb:384-opus4-1110492
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/111049
Advisor:	Bjoern Schuller
Type:	Doctoral Thesis
Language:	English
Year of first Publication:	2023
Publishing Institution:	Universität Augsburg
Granting Institution:	Universität Augsburg, Fakultät für Angewandte Informatik
Date of final exam:	2023/08/16
Release Date:	2024/02/06
GND-Keyword:	Deep learning; Sprachverarbeitung; Rauschunterdrückung; SARS; Diagnosesystem
Pagenumber:	IX, 150
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	Deutsches Urheberrecht

Open Access

Deep learning approaches for adaptive audio processing and binary classification in digital health

Download full text files

Export metadata

Statistics

Additional Services