Scribosermo: fast speech-to-text models for German and other languages

Recent Speech-to-Text models often require a large amount of hardware resources and are mostly trained in English. This paper presents Speech-to-Text models for German, as well as for Spanish and French with special features: (a) They are small and run in real-time on microcontrollers like a RaspberryPi. (b) Using a pretrained English model, they can be trained on consumer-grade hardware with a relatively small dataset. (c) The models are competitive with other solutions and outperform them in German. In this respect, the models combine advantages of other approaches, which only include a subset of the presented features. Furthermore, the paper provides a new library for handling datasets, which is focused on easy extension with additional datasets and shows an optimized way for transfer-learning new languages using a pretrained model from another language with a similar alphabet.

Metadaten
Author:	Daniel Bermuth GND, Alexander Poeppel ORCiD GND, Wolfgang Reif ORCiD GND
URN:	urn:nbn:de:bvb:384-opus4-992202
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/99220
Parent Title (English):	arXiv
Publisher:	arXiv
Type:	Preprint
Language:	English
Year of first Publication:	2021
Publishing Institution:	Universität Augsburg
Release Date:	2022/11/10
First Page:	arXiv:2110.07982
DOI:	https://doi.org/10.48550/arXiv.2110.07982
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Software & Systems Engineering
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Softwaretechnik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Softwaretechnik / Lehrstuhl für Softwaretechnik
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY-SA 4.0: Creative Commons: Namensnennung - Weitergabe unter gleichen Bedingungen (mit Print on Demand)

Open Access