• search hit 11 of 533
Back to Result List

Bahn-Vorhersage dataset: an open archive of most train delays in Germany since September 2021

  • Open data regarding granular train delays in Germany is scarce. This paper presents the longitudinal dataset of train delays gathered for the reliability prediction project Bahn-Vorhersage. Since September 2021, data has been collected via the Deutsche Bahn Timetable API, which provides both scheduled and real-time information for the entire German rail network. By implementing a polling strategy for every station, an almost complete record of train operations has been archived. The raw data is augmented with trip length information derived from OpenStreetMap and parsed into a flattened tabular format optimized for statistical analysis and machine learning applications. The dataset covers the period from September 2021 to the present and is continuously updated. Significant data gaps resulting from hardware or network outages have been documented to ensure transparency. The weekly volume of collected train stops is comparable to the official dataset published by DELFI e.V. via theOpen data regarding granular train delays in Germany is scarce. This paper presents the longitudinal dataset of train delays gathered for the reliability prediction project Bahn-Vorhersage. Since September 2021, data has been collected via the Deutsche Bahn Timetable API, which provides both scheduled and real-time information for the entire German rail network. By implementing a polling strategy for every station, an almost complete record of train operations has been archived. The raw data is augmented with trip length information derived from OpenStreetMap and parsed into a flattened tabular format optimized for statistical analysis and machine learning applications. The dataset covers the period from September 2021 to the present and is continuously updated. Significant data gaps resulting from hardware or network outages have been documented to ensure transparency. The weekly volume of collected train stops is comparable to the official dataset published by DELFI e.V. via the National Access Point but exhibits greater temporal stability. Furthermore, punctuality statistics derived from the dataset align closely with official figures published by Deutsche Bahn, validating its representativeness. The dataset is freely available under the Open Database License (ODbL).show moreshow less

Download full text files

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Theo DöllmannORCiD
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/127213
Type:Preprint
Language:English
Date of Publication (online):2025/12/24
Year of first Publication:2025
Publishing Institution:Universität Augsburg
Release Date:2026/01/07
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Geographie
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Latest Publications (not yet published in print):Aktuelle Publikationen (noch nicht gedruckt erschienen)
Licence (German):Deutsches Urheberrecht