Releases: agude/SWITRS-to-SQLite
v4.5.0: Header-based column resolution for robust CSV parsing
Release Notes:
This release refactors the CSV parsing system to use dynamic header-based column resolution instead of
hardcoded indices. This makes the parser resilient to column reordering in future SWITRS data releases and
includes several performance optimizations for processing large files.
There are no breaking changes to the CLI or database schema in this release.
What's New
-
Header-Based Column Resolution: The parser now reads CSV headers at runtime to determine column
positions, rather than relying on hardcoded indices. This ensures compatibility if CHP reorders columns in
future SWITRS data exports. -
Duplicate Header Detection: Added validation that fails fast with a clear error message if a CSV file
contains duplicate column headers, preventing subtle data ingestion bugs. -
Performance Optimizations: Pre-calculated column indices during file initialization to eliminate per-row
dictionary lookups and iteration. For multi-million row SWITRS files, this reduces overhead in the hot path
of__set_valuesand date conversion methods. -
Automatic BOM Handling: Switched file reading to use
utf-8-sigencoding, which automatically strips
the byte-order mark if present. This is the Pythonic approach compared to manual string manipulation. -
Empty File Handling: Added graceful handling for empty input files (or files containing only a BOM),
which previously caused an unhandledStopIterationexception. -
Code Clarity: Renamed
RowClasstorow_parserinmain.pyto accurately reflect that these are
CSVParserinstances, not class types.
Release 4.4.0: `src` Layout Migration & Integration Tests
Release Notes:
This release modernizes the repository structure by migrating to the industry-standard src layout, ensuring better isolation between the development environment and installed package. It also introduces a comprehensive end-to-end integration testing suite using golden snapshots to guarantee database consistency across versions.
There are no breaking changes to the CLI or database schema in this release.
What's New
-
srcLayout Migration: Moved the package source code into asrc/subdirectory. This prevents common "double import" errors where tests run against the local directory instead of the installed package, and aligns the project with modern Python packaging standards (supported natively byuvandhatch). -
Golden Snapshot Integration Tests: Added a new end-to-end integration test suite (
tests/test_integration.py) that converts raw CSV data to SQLite and verifies the result against agolden_snapshot.json. This ensures that changes to parsers or converters do not silently alter the resulting database structure or content. -
Test Data Extraction Tools: Added
scripts/extract_test_rows.py, a utility that analyzes massive SWITRS datasets to greedily select a small subset of rows that maximize coverage of all internal value mappings (enums). This allows for high-coverage testing with minimal file size. -
Type Hinting Improvements: Updated strict type checking configurations in
mypyand added explicit return type annotations (-> None) to the test suite.
v4.3.0: Schema Dataclass Refactor
Release Notes:
This release refactors the internal schema definition system from tuple-based DSL to typed dataclasses, improving code readability, type safety, and maintainability. Converter functions now use explicit signatures instead of **kwargs.
There are no breaking changes to the CLI or database schema in this release.
What's New
-
Column Dataclass Schema: Replaced the
tuple-based "mystery meat" DSL inrow_types.pywith a frozenColumndataclass in the newschema.pymodule. Field definitions are now self-documenting with named attributes (index,name,sql_type,nulls,converter,mapping) instead of positionaltupleelements. -
Explicit Converter Signatures: Refactored all converter functions from
**kwargsto explicit (val,dtype,nulls) parameters. This enables IDE autocompletion, catches typos at call sites, and makes function contracts clear. -
Default Identity Converter: Added an
identity()converter function as the default forColumn. The parser now unconditionally callscol.converter()withoutNonechecks, simplifying the parsing loop. -
Set-Based Null Checks: Changed
DEFAULT_NULLSfrom alistto asetfor O(1) membership lookups. Custom null collections now usesetunion (|) instead oflistconcatenation. -
Modern Type Annotations: Updated type hints to use
collections.abcimports (Callable,Collection,Mapping) instead oftypingmodule equivalents. Thenullsparameter now accepts anyCollection[str], allowingsets,tuples, orlists. -
Improved Type Safety: The
Column.mappingfield usesMapping[str, str | None](covariant) instead ofdict, allowing the existing value maps to pass type checking without modification.
v4.2.0: Modernize Build System and Development Tooling
Release Notes:
This release modernizes the project's build system and development tooling. It migrates from legacy setup.py to modern Python packaging standards, adds automated linting and formatting, and updates CI/CD workflows.
There are no breaking changes to the CLI or database schema in this release.
What's New
-
Modern Python Packaging: Migrated from
setup.pytopyproject.tomlusing hatchling as the build backend. This removes the pypandoc dependency and enables native markdown README support on PyPI. -
UV Package Manager: Adopted UV for fast, reproducible dependency management. A
uv.lockfile is now included for consistent environments across development and CI. -
Ruff Linting and Formatting: Added Ruff for linting and code formatting. The codebase has been reformatted for consistency, imports are sorted, and string formatting modernized to f-strings.
-
Updated CI/CD Workflows: GitHub Actions workflows now use UV for installation and include Ruff checks. The release workflow uses reusable workflows to eliminate duplication. All actions updated to current versions (checkout v6, setup-uv v7, pypi-publish v1.13).
-
Dependabot Configuration: Added Dependabot for automated monthly updates to GitHub Actions and UV dependencies.
-
Justfile Task Runner: Added a Justfile for common development commands. Run
justto see available tasks includingtest,lint,format,build, andcheck. -
Python Version Support: Updated supported Python versions to 3.10–3.14. Dropped Python 3.8 and 3.9 which are in security-only maintenance.
4.1.3
Fix a couple of Makes that were mapped wrong.
Full Changelog: 4.1.2...4.1.3
4.1.2
4.1.1
4.1.0
4.0.0
The first major version change in almost a year! This one includes:
- Almost every categorical column has been mapped to a human readable string.
chp_road_typeis one exception, because I can not find information about what the road types are. - Closed #7: There is now a human-redable
county_locationcolumn. - Closed #3:
bicycle_collisionis now True/False instead of True/NULL. - Closed #6: The most common makes are now normalized, but there are still a bunch of missing ones. See new bug: #12