Joanna Fąferek, Andrzej A. Kononowicz, Nataliia Bogutska, Vital Da Silva Domingues, Nataliia Davydova, Ada Frankowska, Isabel Iguacel, Anja Mayer, Luc Morin, Nataliia Pavlyukovich, Iryna Popova, Tetiana Shchudrova, Małgorzata Sudacka, Renata Szydlak, Inga Hege
- Background
Virtual patients (VPs) are useful tools in training of medical students’ clinical reasoning abilities. However, creating high-quality and peer-reviewed VPs is time-consuming and resource-intensive. Therefore, the aim of this study was to investigate whether generative artificial intelligence (AI) could facilitate the planning and creation of a diverse collection of VPs suitable for training medical students in clinical reasoning.
Methods
We used ChatGPT to generate a blueprint for 200 diverse VPs that adequately represent the population in Europe. We selected five VPs from the blueprint to be created by humans and ChatGPT. We assessed the generated blueprint for representativeness and internal consistency, and we reviewed the VPs in a multi-step, partly blinded process for didactical quality and content accuracy. Finally, we received 44 VP evaluations from medical students.
Results
The generated blueprint did not meet our expectations in terms of quality orBackground
Virtual patients (VPs) are useful tools in training of medical students’ clinical reasoning abilities. However, creating high-quality and peer-reviewed VPs is time-consuming and resource-intensive. Therefore, the aim of this study was to investigate whether generative artificial intelligence (AI) could facilitate the planning and creation of a diverse collection of VPs suitable for training medical students in clinical reasoning.
Methods
We used ChatGPT to generate a blueprint for 200 diverse VPs that adequately represent the population in Europe. We selected five VPs from the blueprint to be created by humans and ChatGPT. We assessed the generated blueprint for representativeness and internal consistency, and we reviewed the VPs in a multi-step, partly blinded process for didactical quality and content accuracy. Finally, we received 44 VP evaluations from medical students.
Results
The generated blueprint did not meet our expectations in terms of quality or representativeness and showed repetitive patterns and an unusually high number of atypical VP outlines.
The ChatGPT- and human-generated VPs were comparable in terms of didactic quality and medical accuracy. Neither contained any medically incorrect information and reviewers and students could not discern significant differences. However, the five human-created VPs demonstrated a greater variety in storytelling, differential diagnosis, and patient-doctor interaction. The ChatGPT-generated VPs also included AI-generated patient images; however, we could not generate realistic clinical images.
Conclusions
While we do not consider ChatGPT in its current version capable of generating a realistic blueprint for a VP collection, we believe that the process of prompting, combined with iterative discussions and refinements after each step, is promising and warrants further exploration. Similarly, although ChatGPT-generated VPs can serve as a good starting point, the variety of VP scenarios in a large collection may be limited without interactions between authors and reviewers to further refine it.…

