Markus W. Scheppach, David Rauber, Johannes Stallhofer, Anna Muzalyova, Vera Otten, Carolin Manzeneder, Tanja Schwamberger, Julia Wanzl, Jakob Schlottmann, Vidan Tadic, Andreas Probst, Elisabeth Schnoy, Christoph Römmele, Carola Fleischmann, Michael Meinikheim, Silvia Miller, Bruno Märkl, Andreas Stallmach, Christoph Palm, Helmut Messmann, Alanna Ebigbo
- Background and aims
Celiac disease with its endoscopic manifestation of villous atrophy is underdiagnosed worldwide. The application of artificial intelligence (AI) for the macroscopic detection of villous atrophy at routine esophagogastroduodenoscopy may improve diagnostic performance.
Methods
A dataset of 858 endoscopic images of 182 patients with villous atrophy and 846 images from 323 patients with normal duodenal mucosa was collected and used to train a ResNet 18 deep learning model to detect villous atrophy. An external data set was used to test the algorithm, in addition to six fellows and four board certified gastroenterologists. Fellows could consult the AI algorithm’s result during the test. From their consultation distribution, a stratification of test images into “easy” and “difficult” was performed and used for classified performance measurement.
Results
External validation of the AI algorithm yielded values of 90 %, 76 %, and 84 % for sensitivity, specificity,Background and aims
Celiac disease with its endoscopic manifestation of villous atrophy is underdiagnosed worldwide. The application of artificial intelligence (AI) for the macroscopic detection of villous atrophy at routine esophagogastroduodenoscopy may improve diagnostic performance.
Methods
A dataset of 858 endoscopic images of 182 patients with villous atrophy and 846 images from 323 patients with normal duodenal mucosa was collected and used to train a ResNet 18 deep learning model to detect villous atrophy. An external data set was used to test the algorithm, in addition to six fellows and four board certified gastroenterologists. Fellows could consult the AI algorithm’s result during the test. From their consultation distribution, a stratification of test images into “easy” and “difficult” was performed and used for classified performance measurement.
Results
External validation of the AI algorithm yielded values of 90 %, 76 %, and 84 % for sensitivity, specificity, and accuracy, respectively. Fellows scored values of 63 %, 72 % and 67 %, while the corresponding values in experts were 72 %, 69 % and 71 %, respectively. AI consultation significantly improved all trainee performance statistics. While fellows and experts showed significantly lower performance for “difficult” images, the performance of the AI algorithm was stable.
Conclusion
In this study, an AI algorithm outperformed endoscopy fellows and experts in the detection of villous atrophy on endoscopic still images. AI decision support significantly improved the performance of non-expert endoscopists. The stable performance on “difficult” images suggests a further positive add-on effect in challenging cases.…