• search hit 20 of 7632
Back to Result List

From proprietary printed forms to standardized digital exchange formats – care transition records to FHIR as an example

  • The transition from proprietary, paper-based care transition records (CTRs) to standardized digital formats like HL7 FHIR remains a significant challenge for healthcare institutions. Variability in document layouts, coupled with slow adoption of new interoperability standards, complicates efforts to digitize patient records while preserving data integrity and preserving data integrity and privacy. This study presents a machine learning-based pipeline for automated information extraction from scanned CTRs. Synthetic training data was generated using a custom CTR generator. A Detectron2-based object detection model, integrated with LayoutParser for document structure analysis and Tesseract OCR for text recognition, was trained on this synthetic dataset. Checkbox detection was performed via an image-processing pipeline based on pixel density analysis. Extracted information was mapped to the FHIR-based PIO-ULB (Pflegeinformationsobjekt - Überleitungsbogen) format using a customThe transition from proprietary, paper-based care transition records (CTRs) to standardized digital formats like HL7 FHIR remains a significant challenge for healthcare institutions. Variability in document layouts, coupled with slow adoption of new interoperability standards, complicates efforts to digitize patient records while preserving data integrity and preserving data integrity and privacy. This study presents a machine learning-based pipeline for automated information extraction from scanned CTRs. Synthetic training data was generated using a custom CTR generator. A Detectron2-based object detection model, integrated with LayoutParser for document structure analysis and Tesseract OCR for text recognition, was trained on this synthetic dataset. Checkbox detection was performed via an image-processing pipeline based on pixel density analysis. Extracted information was mapped to the FHIR-based PIO-ULB (Pflegeinformationsobjekt - Überleitungsbogen) format using a custom serialization tool. A synthetic dataset of 10,000 CTR samples was used for training and evaluation. The model achieved high values for accuracy, precision, recall, and F1-score metrics for synthetic data (97%, 98%, 95%, 97%) and showed robust performance for real-world data (85%, 86%, 83%, 85%). Lower performance on real-world data was attributed to layout variability and scanning artifacts absent from the synthetic training set. The results demonstrate the feasibility of using machine learning for automated extraction and standardization of CTRs, particularly when relying on synthetic data to overcome data privacy constraints during development. While accuracy declines with real-world document variability, the approach provides a possible interim solution for facilitating interoperability in healthcare documentation. Future work will focus on extending data generation to cover complex document layouts and integrating advanced OCR and handwriting recognition methods to further improve extraction performance.show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Viktor Werlitz, Lukas Kleybolte, Sabahudin Balic, Elisabeth V. Mess, Andreas MahlerORCiD, Claudia Reuter, Alexandra Teynor
URN:urn:nbn:de:bvb:384-opus4-1251564
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/125156
ISBN:9781643686158OPAC
ISSN:0926-9630OPAC
ISSN:1879-8365OPAC
Parent Title (English):German Medical Data Sciences 2025: GMDS Illuminates Health: proceedings of the 70th Annual Meeting of the German Association of Medical Informatics, Biometry, and Epidemiology e.V. (gmds), Jena, Germany, 7-11 September 2025
Publisher:IOS Press
Place of publication:Amsterdam
Editor:Rainer Röhrig, Thomas Ganslandt, Klaus Jung, Ann-Kristin Kock-Schoppenhauer, Jochem König, Ulrich Sax, Martin Sedlmayr, Cord Spreckelsen, Antonia Zapf
Type:Conference Proceeding
Language:English
Year of first Publication:2025
Publishing Institution:Universität Augsburg
Release Date:2025/09/20
First Page:162
Last Page:169
Series:Studies in Health Technology and Informatics ; 331
DOI:https://doi.org/10.3233/shti251392
Institutes:Medizinische Fakultät
Medizinische Fakultät / Universitätsklinikum
Dewey Decimal Classification:6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit
Licence (German):CC-BY-NC 4.0: Creative Commons: Namensnennung - Nicht kommerziell (mit Print on Demand)