Several countries publish official transit stop data. This data usually differs from crowd-sources OpenStreetMap stop data.
This project compares offical stop data against OpenStreetMap stop data.
This project is bases on Tobiko Data's SQLMesh, a data transformation framework. You'll need python and a couple of unix tools like make, curl etc.
For updating the OSM pbf file after the initial download, this project uses a dockerized version of pyosmium. So needs to be installed, also.
To compare the German DELFI zHV dataset against OSM, you'll need to download the zHV dataset, the DELFI GTFS dataset (which we use to derive preceding and subsequent stops and route types served at a stop) as well as the OSM dataset germany-latest.pbf from Geofabrik. Finally, a dataset providing names and codes of German districts needs to be downloaded from BKG as well.
To download these, a simple
$ make downloadshould be sufficient.
To compare the official data against OSM data, run make compare:
$ make compareThis will create the duckdb database (db_de.db) and perform the matching. This will, depending on you machine, take a couple of minutes.
To create reports documenting the comparison results, you may run the scripts/generate_reports.py via
$ make generate-reportsIt will render an overview index.html and for every district a detailed html report as well as a CSV file per district, which you can import e.g. in a GIS to do further analysis.
While this project is currently focused on German stop data, it may be adapted to analyse other countries stop data. For details how to proceed, see other_countries.md
In case of problems, the troubleshooting guide may be helpful, or, if you think you encountered an issue, please report it.