One test, many tongues: surveying language proficiency across the globe

  • Language influences our thinking and affects many aspects of cognition, from how we perceive the world to how we interact socially. Thus, objectively characterizing linguistic background is crucial for research in many areas, including second language acquisition, psycho-linguistics, and cognitive science. Traditional language proficiency tests, however, are manually composed by experts, limiting their scope for both lab and online settings. Here, we propose a pipeline that automatically derives a language proficiency test from a corpus of text and applies it to create new tests for 1,939 languages. Using this approach, we conducted a large-scale survey examining L1 and L2 proficiency across 34 countries, with participants tested on all 34 languages. Drawing from human ratings from 4,137 participants, our results validate that our test can effectively distinguish native speakers, second-language speakers, and nonspeakers within one minute, making it an effective tool for evaluatingLanguage influences our thinking and affects many aspects of cognition, from how we perceive the world to how we interact socially. Thus, objectively characterizing linguistic background is crucial for research in many areas, including second language acquisition, psycho-linguistics, and cognitive science. Traditional language proficiency tests, however, are manually composed by experts, limiting their scope for both lab and online settings. Here, we propose a pipeline that automatically derives a language proficiency test from a corpus of text and applies it to create new tests for 1,939 languages. Using this approach, we conducted a large-scale survey examining L1 and L2 proficiency across 34 countries, with participants tested on all 34 languages. Drawing from human ratings from 4,137 participants, our results validate that our test can effectively distinguish native speakers, second-language speakers, and nonspeakers within one minute, making it an effective tool for evaluating linguistic proficiency. We show that participants' linguistic and demographic backgrounds systematically influence both their language proficiency and their self-reported skills, and we map the prevalence of global languages, such as English and Spanish, among online participants. Moreover, we show that our vocabulary tests are strongly correlated with other linguistic competences-such as listening and writing-in a set of typologically varied languages, demonstrating our test is an efficient instrument to assess language proficiency. More broadly, our work offers a significant resource for investigating global variation in language skills and contributes to reducing the overreliance on the English language in the cognitive and social sciences.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Pol van Rijn, Yue Sun, Harin Lee, Raja Marjieh, Ilia Sucholutsky, Francesca Lanzarini, Elisabeth AndréORCiDGND, Nori Jacoby
URN:urn:nbn:de:bvb:384-opus4-1296566
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/129656
ISSN:0027-8424OPAC
ISSN:1091-6490OPAC
Parent Title (English):Proceedings of the National Academy of Sciences (PNAS)
Publisher:Proceedings of the National Academy of Sciences
Place of publication:Washington, D.C.
Type:Article
Language:English
Year of first Publication:2026
Publishing Institution:Universität Augsburg
Release Date:2026/04/10
Volume:123
Issue:13
First Page:e2420179123
DOI:https://doi.org/10.1073/pnas.2420179123
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Menschzentrierte Künstliche Intelligenz
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):CC-BY 4.0: Creative Commons: Namensnennung