Scribosermo: fast speech-to-text models for German and other languages

  • Recent Speech-to-Text models often require a large amount of hardware resources and are mostly trained in English. This paper presents Speech-to-Text models for German, as well as for Spanish and French with special features: (a) They are small and run in real-time on microcontrollers like a RaspberryPi. (b) Using a pretrained English model, they can be trained on consumer-grade hardware with a relatively small dataset. (c) The models are competitive with other solutions and outperform them in German. In this respect, the models combine advantages of other approaches, which only include a subset of the presented features. Furthermore, the paper provides a new library for handling datasets, which is focused on easy extension with additional datasets and shows an optimized way for transfer-learning new languages using a pretrained model from another language with a similar alphabet.

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Daniel BermuthGND, Alexander PoeppelORCiDGND, Wolfgang ReifORCiDGND
URN:urn:nbn:de:bvb:384-opus4-992202
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/99220
Parent Title (English):arXiv
Publisher:arXiv
Type:Preprint
Language:English
Year of first Publication:2021
Publishing Institution:Universität Augsburg
Release Date:2022/11/10
First Page:arXiv:2110.07982
DOI:https://doi.org/10.48550/arXiv.2110.07982
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Software & Systems Engineering
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Softwaretechnik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Softwaretechnik / Lehrstuhl für Softwaretechnik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY-SA 4.0: Creative Commons: Namensnennung - Weitergabe unter gleichen Bedingungen (mit Print on Demand)