Identifying languages in a novel dataset: ASMR-whispered speech

Song, Meishu; Yang, Zijiang; Parada-Cabaleiro, Emilia; Jing, Xin; Yamamoto, Yoshiharu; Schuller, Björn

Identifying languages in a novel dataset: ASMR-whispered speech

Meishu Song, Zijiang Yang, Emilia Parada-Cabaleiro, Xin Jing, Yoshiharu Yamamoto, Björn Schuller

Introduction: The Autonomous Sensory Meridian Response (ASMR) is a combination of sensory phenomena involving electrostatic-like tingling sensations, which emerge in response to certain stimuli. Despite the overwhelming popularity of ASMR in the social media, no open source databases on ASMR related stimuli are yet available, which makes this phenomenon mostly inaccessible to the research community; thus, almost completely unexplored. In this regard, we present the ASMR Whispered-Speech (ASMR-WS) database. Methods: ASWR-WS is a novel database on whispered speech, specifically tailored to promote the development of ASMR-like unvoiced Language Identification (unvoiced-LID) systems. The ASMR-WS database encompasses 38 videos-for a total duration of 10 h and 36 min-and includes seven target languages (Chinese, English, French, Italian, Japanese, Korean, and Spanish). Along with the database, we present baseline results for unvoiced-LID on the ASMR-WS database. Results: Our bestIntroduction: The Autonomous Sensory Meridian Response (ASMR) is a combination of sensory phenomena involving electrostatic-like tingling sensations, which emerge in response to certain stimuli. Despite the overwhelming popularity of ASMR in the social media, no open source databases on ASMR related stimuli are yet available, which makes this phenomenon mostly inaccessible to the research community; thus, almost completely unexplored. In this regard, we present the ASMR Whispered-Speech (ASMR-WS) database. Methods: ASWR-WS is a novel database on whispered speech, specifically tailored to promote the development of ASMR-like unvoiced Language Identification (unvoiced-LID) systems. The ASMR-WS database encompasses 38 videos-for a total duration of 10 h and 36 min-and includes seven target languages (Chinese, English, French, Italian, Japanese, Korean, and Spanish). Along with the database, we present baseline results for unvoiced-LID on the ASMR-WS database. Results: Our best results on the seven-class problem, based on segments of 2s length, and on a CNN classifier and MFCC acoustic features, achieved 85.74% of unweighted average recall and 90.83% of accuracy. Discussion: For future work, we would like to focus more deeply on the duration of speech samples, as we see varied results with the combinations applied herein. To enable further research in this area, the ASMR-WS database, as well as the partitioning considered in the presented baseline, is made accessible to the research community.…

Metadaten
Author:	Meishu Song, Zijiang Yang, Emilia Parada-Cabaleiro, Xin Jing, Yoshiharu Yamamoto, Björn Schuller ORCiD GND
URN:	urn:nbn:de:bvb:384-opus4-1061331
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/106133
ISSN:	1662-453XOPAC
Parent Title (English):	Frontiers in Neuroscience
Publisher:	Frontiers Media SA
Place of publication:	Lausanne
Type:	Article
Language:	English
Year of first Publication:	2023
Publishing Institution:	Universität Augsburg
Release Date:	2023/07/18
Tag:	General Neuroscience
Volume:	17
First Page:	1120311
DOI:	https://doi.org/10.3389/fnins.2023.1120311
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand)

Open Access

Identifying languages in a novel dataset: ASMR-whispered speech

Download full text files

Export metadata

Statistics

Print On Demand

Additional Services