Bahn-Vorhersage dataset: an open archive of most train delays in Germany since September 2021
- Open data regarding granular train delays in Germany is scarce. This paper presents the longitudinal dataset of train delays gathered for the reliability prediction project Bahn-Vorhersage. Since September 2021, data has been collected via the Deutsche Bahn Timetable API, which provides both scheduled and real-time information for the entire German rail network. By implementing a polling strategy for every station, an almost complete record of train operations has been archived. The raw data is augmented with trip length information derived from OpenStreetMap and parsed into a flattened tabular format optimized for statistical analysis and machine learning applications. The dataset covers the period from September 2021 to the present and is continuously updated. Significant data gaps resulting from hardware or network outages have been documented to ensure transparency. The weekly volume of collected train stops is comparable to the official dataset published by DELFI e.V. via theOpen data regarding granular train delays in Germany is scarce. This paper presents the longitudinal dataset of train delays gathered for the reliability prediction project Bahn-Vorhersage. Since September 2021, data has been collected via the Deutsche Bahn Timetable API, which provides both scheduled and real-time information for the entire German rail network. By implementing a polling strategy for every station, an almost complete record of train operations has been archived. The raw data is augmented with trip length information derived from OpenStreetMap and parsed into a flattened tabular format optimized for statistical analysis and machine learning applications. The dataset covers the period from September 2021 to the present and is continuously updated. Significant data gaps resulting from hardware or network outages have been documented to ensure transparency. The weekly volume of collected train stops is comparable to the official dataset published by DELFI e.V. via the National Access Point but exhibits greater temporal stability. Furthermore, punctuality statistics derived from the dataset align closely with official figures published by Deutsche Bahn, validating its representativeness. The dataset is freely available under the Open Database License (ODbL).…

