GastroGPT: development and controlled testing of a proof-of-concept customized clinical language model

Simsek, Cem; Ucdal, Mete; de-Madaria, Enrique; Ebigbo, Alanna; Vanek, Petr; Elshaarawy, Omar; Voiosu, Theodor Alexandru; Antonelli, Giulio; Turró, Román; Gisbert, Javier P; Nyssen, Olga P.; Hassan, Cesare; Messmann, Helmut; Jalan, Rajiv

doi:10.1055/a-2637-2163

search hit 51 of 52435

GastroGPT: development and controlled testing of a proof-of-concept customized clinical language model

Cem Simsek, Mete Ucdal, Enrique de-Madaria, Alanna Ebigbo, Petr Vanek, Omar Elshaarawy, Theodor Alexandru Voiosu, Giulio Antonelli, Román Turró, Javier P Gisbert, Olga P. Nyssen, Cesare Hassan, Helmut Messmann, Rajiv Jalan

Background and study aims: Current general-purpose artificial intelligence (AI) large language models (LLMs) demonstrate limited efficacy in clinical medicine, often constrained to question-answering, documentation, and literature summarization roles. We developed GastroGPT, a proof-of-concept specialty-specific, multi-task, clinical LLM, and evaluated its performance against leading general-purpose LLMs across key gastroenterology tasks and diverse case scenarios. Methods: In this structured analysis, GastroGPT was compared with three state-of-the-art general-purpose LLMs (LLM-A: GPT-4, LLM-B: Bard, LLM-C: Claude). Models were assessed on seven clinical tasks and overall performance across 10 simulated gastroenterology cases varying in complexity, frequency, and patient demographics. Standardized prompts facilitated structured comparisons. A blinded expert panel rated model outputs per task on a 10-point Likert scale, judging clinical utility. Comprehensive statistical analyses wereBackground and study aims: Current general-purpose artificial intelligence (AI) large language models (LLMs) demonstrate limited efficacy in clinical medicine, often constrained to question-answering, documentation, and literature summarization roles. We developed GastroGPT, a proof-of-concept specialty-specific, multi-task, clinical LLM, and evaluated its performance against leading general-purpose LLMs across key gastroenterology tasks and diverse case scenarios. Methods: In this structured analysis, GastroGPT was compared with three state-of-the-art general-purpose LLMs (LLM-A: GPT-4, LLM-B: Bard, LLM-C: Claude). Models were assessed on seven clinical tasks and overall performance across 10 simulated gastroenterology cases varying in complexity, frequency, and patient demographics. Standardized prompts facilitated structured comparisons. A blinded expert panel rated model outputs per task on a 10-point Likert scale, judging clinical utility. Comprehensive statistical analyses were conducted. Results: A total of 2,240 expert ratings were obtained. GastroGPT achieved significantly higher mean overall scores (8.1 ± 1.8) compared with GPT-4 (5.2 ± 3.0), Bard (5.7 ± 3.3), and Claude (7.0 ± 2.7) (all P < 0.001). It outperformed comparators in six of seven tasks ( P < 0.05), except follow-up planning. GastroGPT demonstrated superior score consistency (variance 34.95) versus general models (97.4-260.35) ( P < 0.001). Its performance remained consistent across case complexities and frequencies, unlike the comparators ( P < 0.001). Multivariate analysis revealed that model type significantly predicted performance ( P < 0.001). Conclusions: This study pioneered development and comparison of a specialty-specific, clinically-oriented AI model to general-purpose LLMs. GastroGPT demonstrated superior utility overall and on key gastroenterology tasks, highlighting the potential for tailored, task-focused AI models in medicine.…

Metadaten
Author:	Cem Simsek, Mete Ucdal, Enrique de-Madaria, Alanna Ebigbo ORCiD GND, Petr Vanek, Omar Elshaarawy, Theodor Alexandru Voiosu, Giulio Antonelli, Román Turró, Javier P Gisbert, Olga P. Nyssen, Cesare Hassan, Helmut Messmann ORCiD GND, Rajiv Jalan
URN:	urn:nbn:de:bvb:384-opus4-1249275
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/124927
ISSN:	2364-3722OPAC
ISSN:	2196-9736OPAC
Parent Title (English):	Endoscopy International Open
Publisher:	Georg Thieme
Place of publication:	Stuttgart
Type:	Article
Language:	English
Year of first Publication:	2025
Publishing Institution:	Universität Augsburg
Release Date:	2025/09/09
Volume:	13
First Page:	a26372163
DOI:	https://doi.org/10.1055/a-2637-2163
Institutes:	Medizinische Fakultät
	Medizinische Fakultät / Universitätsklinikum
	Medizinische Fakultät / Lehrstuhl für Innere Medizin mit Schwerpunkt Gastroenterologie
Dewey Decimal Classification:	6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit
Licence (German):	CC-BY 4.0: Creative Commons: Namensnennung (mit Print on Demand)

Open Access

GastroGPT: development and controlled testing of a proof-of-concept customized clinical language model

Download full text files

Export metadata

Statistics

Print On Demand

Additional Services