Fix Windows path collision in hotel invoice data#2495
Fix Windows path collision in hotel invoice data#2495itsryu-dq wants to merge 5 commits intoopenai:mainfrom
Conversation
Add author metadata for ryu-omnithrex so the GPT-5.4 prompting guide notebook displays proper attribution on cookbook.openai.com.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a7bee06161
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
This PR is ready for maintainer review. Summary of final updates: Removed the trailing-space dataset directory:
and retained the canonical dataset path:
The repository previously contained both paths with the same 31 JSON filenames. Because Windows filesystems normalize trailing spaces in path components, this created 31 path collisions and caused checkout failures on native Windows environments. The issue originated from commit:
merged through PR #1273 – Data Extraction & Transformation with GPT-4o. This PR also adds a repository path portability guard and CI validation to prevent similar cross-platform filesystem issues from being introduced in the future. Validation performed:
From my side the repository tree is now free of Windows-hostile path components and protected by CI path validation. Happy to adjust anything maintainers prefer regarding dataset handling or documentation wording. |
6589d37 to
ad5f52f
Compare
|
Final update from my side. Addressed the automated Codex review suggestion regarding control characters in the path portability guard. The check previously exempted Current PR changes include:
All local checks pass and the portability guard validates the repository tree successfully. From my side this PR is complete and ready for maintainer review. |
Summary
This PR fixes a cross-platform checkout failure caused by a trailing-space path collision under:
examples/data/hotel_invoices/The repository previously contained two directory paths:
examples/data/hotel_invoices/extracted_invoice_json/examples/data/hotel_invoices/extracted_invoice_jsonThe second directory contained a trailing ASCII space in the name. Because both directories contained the same 31 filenames, Windows path normalization collapsed them into identical paths, causing checkout failures on native Windows environments.
This PR removes the trailing-space dataset tree and keeps the canonical dataset path already referenced in the repository:
examples/data/hotel_invoices/extracted_invoice_json/Additionally, this PR introduces a repository path portability guard and CI validation to prevent similar filesystem portability issues from occurring again.
Motivation
This change restores compatibility for contributors using native Windows environments.
Windows filesystems do not support path components that end with a trailing space. When both directories existed in the repository, Windows normalized them to the same path during checkout, causing failures such as:
This issue originated from commit:
ffdd52937d0c82d4fe3e85314ad88439c4a0e3cewhich was merged through:
PR #1273 – "Data Extraction & Transformation with GPT-4o"
#1273
The PR was opened and merged by charu-openai, with several content commits contributed by charuj and reviewed by msingh-openai.
Because Linux filesystems allow trailing-space directory names while Windows does not, the issue remained invisible until a Windows checkout attempted to materialize the working tree.
Changes in this PR
examples/data/hotel_invoices/extracted_invoice_jsonexamples/data/hotel_invoices/extracted_invoice_json.github/scripts/check_path_portability.py.github/workflows/validate-notebooks.yamlREADME.mdCONTRIBUTING.mdResult
After this change the repository is fully checkoutable on:
without requiring sparse checkout or filesystem workarounds.
The CI portability guard prevents similar path portability issues from being introduced in future commits.
For new content
This PR does not add new cookbook content and only addresses repository portability and infrastructure.
I have added a new entry in registry.yaml so that my content renders on the cookbook website.
I have conducted a self-review of my content based on the contribution guidelines:
Not applicable for this PR.