Skip to content

Add fix script for rewriting STA ScheduledStopPoint IDs#114

Open
leonardehrenfried wants to merge 5 commits into
MMTIS:binary_relation_serializerfrom
leonardehrenfried:sta-ssp-id
Open

Add fix script for rewriting STA ScheduledStopPoint IDs#114
leonardehrenfried wants to merge 5 commits into
MMTIS:binary_relation_serializerfrom
leonardehrenfried:sta-ssp-id

Conversation

@leonardehrenfried

Copy link
Copy Markdown

This script rewrites some IDs in STA's EPIP feed.

Comment thread fix/rewrite_sta_ssp_ids.py Outdated


def _update_refs(obj: Any, id_map: dict[str, str]) -> bool:
def _update_refs(obj: Any) -> bool:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your intention here to iterate over all references?

@leonardehrenfried leonardehrenfried May 20, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, everything that can have a reference to the ScheduledStopPoint.

I agree that it's super generic and handles lots of cases that I don't have in my data set.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can omit most of the code by def only_references(deserialized: Tid, serializer: Serializer) -> Generator[tuple[type[EntityStructure], str, str], None, None]: which does the recursive stuff.

Comment thread fix/rewrite_sta_ssp_ids.py Outdated
@skinkie

skinkie commented May 20, 2026

Copy link
Copy Markdown
Contributor

Can you add to this pull request also a test for loading and fixing a file?

@leonardehrenfried

Copy link
Copy Markdown
Author

The current feed this operates on is 500mb. Do you know of a tool of shrinking netex feeds down to a single journey?

@skinkie

skinkie commented May 20, 2026

Copy link
Copy Markdown
Contributor

The current feed this operates on is 500mb. Do you know of a tool of shrinking netex feeds down to a single journey?

See conv.filter_db_to_db :)

@leonardehrenfried

Copy link
Copy Markdown
Author

Actually, If the code is good, I would prefer to merge this now. I will send a follow up with a test.

@skinkie

skinkie commented May 21, 2026

Copy link
Copy Markdown
Contributor

Nope, I want to see how it behaves, and prevent regressions.

@leonardehrenfried

Copy link
Copy Markdown
Author

Can you point towards an example that I should emulate?

All I can see are tests that appear to be reading from places like /mnt/storage/compressed/ret-epip.lmdb and then never assert anything.

@skinkie

skinkie commented May 21, 2026

Copy link
Copy Markdown
Contributor

I want the code to be running. Hence I don't care at this point about asserting, I care about the code path to be touched. Hence a small subset of 10 stops in a file. Going into mdbx. Fix the result. Export to XML would be good enough.

@leonardehrenfried

Copy link
Copy Markdown
Author

I used the following filter to get a single ServiceJourney:

uv run python -m conv.filter_db_to_db sta.lmdb ServiceJourney it:apb:ServiceJourney:86345-Pizzin-33-1-41880:345D: sta-reduced.lmdb

It produced this: https://p.ip.fi/zbul

Should I commit this to the repo?

@leonardehrenfried

Copy link
Copy Markdown
Author

@skinkie Can you look at the test?

def fix_ssp_ids(database: Path) -> None:
with MdbxStorage(database, readonly=False) as db:
with db.env.rw_transaction() as txn:
# TODO: delete the old ScheduledStopPoint objects (no delete API available yet)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For deleting the following steps must be assured in this order:

  • the id of the object itself must be renamed
  • all internal references must be updated, hence at least ScheduledStopPointRef, TimingPointRef nameOfRefClass="ScheduledStopPoint", ObjectRef (NoticeAssignment), rewriting should cause the updating the referencing
  • the old relationship between objects must be deleted
  • the key with the old object must be deleted

We have avoided such operations, so we fill a new database with the context, and not try to do such invasive operations in place.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you never really delete but simply filtering them out when copying to a new database?

@skinkie skinkie Jun 25, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two facets here. The way we have worked was always to transfer from database to database when doing any transformation, so from NeTEx to NeTEx. The (inline) fix operations work well on attribute level like projection of all coordinates from a national grid to WGS84.

What you are doing here would match something like the EPIP conversion. Do all the transformations, write the output into the second database, and copy_map everything that remains stable. https://github.com/MMTIS/badger/blob/binary_relation_serializer/conv/epip_db_to_db.py#L181

The effect is that anything related to referential relationships are never updated, only created.

So in effect, the code to achieve such thing is virtually the same, but source is copied, and transformed, then written to the target.

The second facet is, that we have always overwritten the key. This is not the case when the id is changed, thus the key changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants