Factors in crowdsourcing for evaluation of complex dialogue systems

  • In the last decade, crowdsourcing has become a popular method for conducting quantitative empirical studies in human-machine interaction. The remote work on a given task in crowdworking settings suits the character of typical speech/language-based interactive systems for instance with regard to argumentative conversations and information retrieval. Thus, crowdworking promises a valuable opportunity to study and evaluate the usability and user experience of real humans in interactions with such interactive systems. In contrast to physical attendance in laboratory studies, crowdsourcing studies offer much more flexible and easier access to large numbers of heterogeneous participants with a specific background, e.g., native speakers or domain expertise. On the other hand, the experimental and environmental conditions as well as the participant's compliance and reliability (at least better monitoring of the latter) are much better controllable in a laboratory. This paper seeks to presentIn the last decade, crowdsourcing has become a popular method for conducting quantitative empirical studies in human-machine interaction. The remote work on a given task in crowdworking settings suits the character of typical speech/language-based interactive systems for instance with regard to argumentative conversations and information retrieval. Thus, crowdworking promises a valuable opportunity to study and evaluate the usability and user experience of real humans in interactions with such interactive systems. In contrast to physical attendance in laboratory studies, crowdsourcing studies offer much more flexible and easier access to large numbers of heterogeneous participants with a specific background, e.g., native speakers or domain expertise. On the other hand, the experimental and environmental conditions as well as the participant's compliance and reliability (at least better monitoring of the latter) are much better controllable in a laboratory. This paper seeks to present a (self-)critical examination of crowdsourcing-based studies in the context of complex (spoken) dialogue systems. It describes and discusses observed issues in crowdsourcing studies involving complex tasks and suggests solutions to improve and ensure the quality of the study results. Thereby, our work contributes to a better understanding and what needs to be considered when designing and evaluating studies with crowdworkers for complex dialogue systems.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Annalena Bea AicherORCiDGND, Stefan Hillmann, Isabel Feustel, Thilo Michael, Sebastian Möller, Wolfgang Minker
URN:urn:nbn:de:bvb:384-opus4-1228398
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/122839
Parent Title (English):13th International Workshop on Spoken Dialogue Systems Technology (IWSDS 2023), Los Angeles, CA, USA, February 21-24, 2023
Publisher:arXiv
Type:Conference Proceeding
Language:English
Year of first Publication:2024
Publishing Institution:Universität Augsburg
Release Date:2025/06/24
First Page:arXiv:2411.11137
DOI:https://doi.org/10.48550/arXiv.2411.11137
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Menschzentrierte Künstliche Intelligenz
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY-NC-ND 4.0: Creative Commons: Namensnennung - Nicht kommerziell - Keine Bearbeitung (mit Print on Demand)