• search hit 14 of 14
Back to Result List

Lecture video highlights detection from speech

  • In interpersonal co-located and online teaching, lecturers highlight words and sentences in their speech in order to implicitly communicate that particular content is important. This social behaviour aimed to capture students' attention becomes crucial in distance learning, where the teacher's voice is an essential instrument to maximise students' attention. To enable intelligent systems, such as smart tutors, to understand and replicate this social behaviour, the ability to automatically recognise speech-based highlighting is needed. To this end, we introduce a public corpus for automatic detection of speech-based highlighting in learning context. With “Highlighting” we refer to the emphasised content, i.e., the important content which lecturers try to emphasise (highlight) by attracting the listeners attention. The dataset is derived from YouTube tutorial videos featuring 104 different English speakers who cover different disciplines. In sum, the dataset, which will be made freelyIn interpersonal co-located and online teaching, lecturers highlight words and sentences in their speech in order to implicitly communicate that particular content is important. This social behaviour aimed to capture students' attention becomes crucial in distance learning, where the teacher's voice is an essential instrument to maximise students' attention. To enable intelligent systems, such as smart tutors, to understand and replicate this social behaviour, the ability to automatically recognise speech-based highlighting is needed. To this end, we introduce a public corpus for automatic detection of speech-based highlighting in learning context. With “Highlighting” we refer to the emphasised content, i.e., the important content which lecturers try to emphasise (highlight) by attracting the listeners attention. The dataset is derived from YouTube tutorial videos featuring 104 different English speakers who cover different disciplines. In sum, the dataset, which will be made freely available to the community. In addition, to establish an analysis for the corpus, we report on a series of experiments with the best results being achieved with a combination of a VGG net and transformer architectures. Our initial results of 78.2% Accuracy and 78.8% Unweighted Average Recall (UAR), encourage us to believe that this new dataset will facilitate progress in speech processing research for education.show moreshow less

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Meishu Song, Ilhan AslanORCiDGND, Emilia Parada-Cabaleiro, Zijiang Yang, Elisabeth AndréORCiDGND, Yoshiharu Yamamoto, Björn W. SchullerORCiDGND
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/118951
ISBN:978-9-4645-9361-7OPAC
Parent Title (English):2024 32nd European Signal Processing Conference (EUSIPCO), Lyon, France, 26-30 August 2024
Publisher:IEEE
Place of publication:Piscataway, NJ
Editor:Patrice Abry, Maria Sabrina Greco, Claire Goursaud, Adrien Meynard
Type:Conference Proceeding
Language:English
Year of first Publication:2024
Publishing Institution:Universität Augsburg
Release Date:2025/02/12
First Page:361
Last Page:365
DOI:https://doi.org/10.23919/eusipco63174.2024.10715058
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Menschzentrierte Künstliche Intelligenz
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing
Nachhaltigkeitsziele
Nachhaltigkeitsziele / Ziel 4 - Hochwertige Bildung
Dewey Decimal Classification:6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit