Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions

Eberhardt, Steffen T.; Vehlen, Antonia; Schaffrath, Jana; Schwartz, Brian; Baur, Tobias; Schiller, Dominik; Hallmen, Tobias; André, Elisabeth; Lutz, Wolfgang

doi:10.1038/s41598-025-14923-y

search hit 1 of 561

Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions

Steffen T. Eberhardt, Antonia Vehlen, Jana Schaffrath, Brian Schwartz, Tobias Baur, Dominik Schiller, Tobias Hallmen, Elisabeth André, Wolfgang Lutz

Rating scales have shaped psychological research, but are resource-intensive and can burden participants. Large Language Models (LLMs) offer a tool to assess latent constructs in text. This study introduces LLM rating scales, which use LLM responses instead of human ratings. We demonstrate this approach with an LLM rating scale measuring patient engagement in therapy transcripts. Automatically transcribed videos of 1,131 sessions from 155 patients were analyzed using DISCOVER, a software framework for local multimodal human behavior analysis. Llama 3.1 8B LLM rated 120 engagement items, averaging the top eight into a total score. Psychometric evaluation showed a normal distribution, strong reliability (ω = 0.953), and acceptable fit (CFI = 0.968, SRMR = 0.022), except RMSEA = 0.108. Validity was supported by significant correlations with engagement determinants (e.g., motivation, r = .413), processes (e.g., between-session efforts, r = .390), and outcomes (e.g., symptoms, r = - .304).Rating scales have shaped psychological research, but are resource-intensive and can burden participants. Large Language Models (LLMs) offer a tool to assess latent constructs in text. This study introduces LLM rating scales, which use LLM responses instead of human ratings. We demonstrate this approach with an LLM rating scale measuring patient engagement in therapy transcripts. Automatically transcribed videos of 1,131 sessions from 155 patients were analyzed using DISCOVER, a software framework for local multimodal human behavior analysis. Llama 3.1 8B LLM rated 120 engagement items, averaging the top eight into a total score. Psychometric evaluation showed a normal distribution, strong reliability (ω = 0.953), and acceptable fit (CFI = 0.968, SRMR = 0.022), except RMSEA = 0.108. Validity was supported by significant correlations with engagement determinants (e.g., motivation, r = .413), processes (e.g., between-session efforts, r = .390), and outcomes (e.g., symptoms, r = - .304). Results remained robust across bootstrap resampling and cross-validation, accounting for nested data. The LLM rating scale exhibited strong psychometric properties, demonstrating the potential of the approach as an assessment tool. Importantly, this automated approach uses interpretable items, ensuring clear understanding of measured constructs, while supporting local implementation and protecting confidential data.…

Metadaten
Author:	Steffen T. Eberhardt ORCiD, Antonia Vehlen ORCiD, Jana Schaffrath ORCiD, Brian Schwartz ORCiD, Tobias Baur ORCiD GND, Dominik Schiller ORCiD GND, Tobias Hallmen ORCiD GND, Elisabeth André ORCiD GND, Wolfgang Lutz ORCiD
URN:	urn:nbn:de:bvb:384-opus4-1247774
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/124777
ISSN:	2045-2322OPAC
Parent Title (English):	Scientific Reports
Publisher:	Springer Science and Business Media LLC
Type:	Article
Language:	English
Year of first Publication:	2025
Publishing Institution:	Universität Augsburg
Release Date:	2025/09/03
Volume:	15
Issue:	1
First Page:	29541
DOI:	https://doi.org/10.1038/s41598-025-14923-y
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Menschzentrierte Künstliche Intelligenz
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand)

Open Access

Development and validation of large language model rating scales for automatically transcribed psychological therapy sessions

Download full text files

Export metadata

Statistics

Print On Demand

Additional Services