Skip to content

Fix Windows path collision in hotel invoice data#2

Open
yishangupenn wants to merge 12 commits intomainfrom
upstream-pr-2495
Open

Fix Windows path collision in hotel invoice data#2
yishangupenn wants to merge 12 commits intomainfrom
upstream-pr-2495

Conversation

@yishangupenn
Copy link

@yishangupenn yishangupenn commented Mar 10, 2026

Copied from upstream: openai/openai-cookbook#2495
Original author: @itsryu-dq
Originally opened: 2026-03-06


Summary

This PR fixes a cross-platform checkout failure caused by a trailing-space path collision under:

examples/data/hotel_invoices/

The repository previously contained two directory paths:

  • examples/data/hotel_invoices/extracted_invoice_json/
  • examples/data/hotel_invoices/extracted_invoice_json

The second directory contained a trailing ASCII space in the name. Because both directories contained the same 31 filenames, Windows path normalization collapsed them into identical paths, causing checkout failures on native Windows environments.

This PR removes the trailing-space dataset tree and keeps the canonical dataset path already referenced in the repository:

examples/data/hotel_invoices/extracted_invoice_json/

Additionally, this PR introduces a repository path portability guard and CI validation to prevent similar filesystem portability issues from occurring again.


Motivation

This change restores compatibility for contributors using native Windows environments.

Windows filesystems do not support path components that end with a trailing space. When both directories existed in the repository, Windows normalized them to the same path during checkout, causing failures such as:


error: invalid path 'examples/data/hotel_invoices/extracted_invoice_json /20190119_002_extracted.json'
fatal: unable to checkout working tree

This issue originated from commit:

ffdd52937d0c82d4fe3e85314ad88439c4a0e3ce

which was merged through:

PR #1273 – "Data Extraction & Transformation with GPT-4o"

openai#1273

The PR was opened and merged by charu-openai, with several content commits contributed by charuj and reviewed by msingh-openai.

Because Linux filesystems allow trailing-space directory names while Windows does not, the issue remained invisible until a Windows checkout attempted to materialize the working tree.


Changes in this PR

  • removed examples/data/hotel_invoices/extracted_invoice_json
  • retained canonical dataset path examples/data/hotel_invoices/extracted_invoice_json
  • added .github/scripts/check_path_portability.py
  • added CI validation to detect:
    • trailing-space path components
    • trailing-period path components
    • Windows reserved device names
    • Windows-normalized path collisions
  • integrated the portability check into .github/workflows/validate-notebooks.yaml
  • added cross-platform path safety documentation to:
    • README.md
    • CONTRIBUTING.md

Result

After this change the repository is fully checkoutable on:

  • Windows
  • macOS
  • Linux
  • WSL

without requiring sparse checkout or filesystem workarounds.

The CI portability guard prevents similar path portability issues from being introduced in future commits.


For new content

This PR does not add new cookbook content and only addresses repository portability and infrastructure.

  • I have added a new entry in registry.yaml so that my content renders on the cookbook website.

  • I have conducted a self-review of my content based on the contribution guidelines:

    • Relevance
    • Uniqueness
    • Spelling and Grammar
    • Clarity
    • Correctness
    • Completeness

Not applicable for this PR.

erikakettleson-openai and others added 12 commits March 3, 2026 09:10
Co-authored-by: Minhajul Hoque <84698472+minh-hoque@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: minh-hoque <minh.hoque@gmail.com>
Co-authored-by: Tom Pakeman <tompakeman@openai.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Add author metadata for ryu-omnithrex so the GPT-5.4 prompting guide notebook displays proper attribution on cookbook.openai.com.
Co-authored-by: Annika Brundyn, Kathy Lau and Nish Singaraju
yishangupenn pushed a commit that referenced this pull request Mar 10, 2026
…89be-5bd8110247a2

Add CONTRIBUTORS.md file with comprehensive contributor acknowledgment
yishangupenn pushed a commit that referenced this pull request Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants