Misspellings in responses to listening comprehension questions: prospects for scoring based on phonetic normalization
- Automated scoring systems which evaluate content require robust ways of dealing with form errors. The work presented in this paper is set in the context of scoring learners’ responses to listening comprehension items included in a placement test of German as a foreign language. Based on a corpus of over 3000 responses to 17 questions, by test takers of different language proficiencies, we perform a quantitative analysis of the diversity in misspellings. We evaluate the performance of an off-the-shelf open source spell-checker on our data showing that around 45% of the reported non-word errors are not correctly accounted for, that is, they are either falsely identified as misspelt or the spell-checker is unable to identify the intended word. We propose to address misspellings in computer-based scoring of constructed response items by means of phonetic normalization. Learner responses transcribed into Soundex codes and into two encodings borrowed from historical linguistics (ASJP andAutomated scoring systems which evaluate content require robust ways of dealing with form errors. The work presented in this paper is set in the context of scoring learners’ responses to listening comprehension items included in a placement test of German as a foreign language. Based on a corpus of over 3000 responses to 17 questions, by test takers of different language proficiencies, we perform a quantitative analysis of the diversity in misspellings. We evaluate the performance of an off-the-shelf open source spell-checker on our data showing that around 45% of the reported non-word errors are not correctly accounted for, that is, they are either falsely identified as misspelt or the spell-checker is unable to identify the intended word. We propose to address misspellings in computer-based scoring of constructed response items by means of phonetic normalization. Learner responses transcribed into Soundex codes and into two encodings borrowed from historical linguistics (ASJP and Dolgopolsky’s sound classes) are compared to transcribed reference answers using string distance measures. We show that reliable correlation with teachers’ scores can be obtained, however, similarity thresholds are item-specific.…
Author: | Heike da Silva CardosoORCiDGND, Magdalena Wolska |
---|---|
URN: | urn:nbn:de:bvb:384-opus4-1134421 |
Frontdoor URL | https://opus.bibliothek.uni-augsburg.de/opus4/113442 |
URL: | https://ep.liu.se/en/conference-article.aspx?series=ecp&issue=114&Article_No=2 |
ISBN: | 978-91-7519-036-5OPAC |
ISSN: | 1650-3686OPAC |
Parent Title (English): | Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015, 11th May 2015, Vilnius, Lithuania |
Publisher: | Linköping University Electronic Press |
Place of publication: | Linköping |
Editor: | Elena Volodina, Lars Borin, Ildikó Pilán |
Type: | Conference Proceeding |
Language: | English |
Year of first Publication: | 2015 |
Publishing Institution: | Universität Augsburg |
Release Date: | 2024/06/13 |
First Page: | 1 |
Last Page: | 10 |
Series: | Linköping Electronic Conference Proceedings ; 114:2 |
Series: | NEALT Proceedings Series ; 26:2 |
Institutes: | Universität Serviceeinrichtungen |
Universität Serviceeinrichtungen / Universitätsbibliothek | |
Dewey Decimal Classification: | 4 Sprache / 40 Sprache / 400 Sprache |
Licence (German): | ![]() |