Skip to content

Work on Version 1.17.0#331

Open
tilo wants to merge 20 commits intomainfrom
version-1.17.0
Open

Work on Version 1.17.0#331
tilo wants to merge 20 commits intomainfrom
version-1.17.0

Conversation

@tilo
Copy link
Copy Markdown
Owner

@tilo tilo commented Apr 13, 2026

previous versions of SmarterCSV required that we do a IO.rewind after automatic detection of row_sep and col_sep. This limits the use cases of the gem.
We also want to be able to use it for streaming input.

Work on this branch:

  • no more IO.rewind when doing auto-detection

This adds a peekable IO buffer

  • it fetches the first N bytes from the input into the buffer
  • does the autodetection within the buffer
  • rewinds the buffer (not the IO)
  • starts the actual CSV processing by pre-pending the buffer to the already progressed position in the IO stream

tilo added 20 commits April 11, 2026 17:46
  PeekableIO: buffer stores raw bytes in external encoding; maybe_transcode
  applies ext→int conversion on read-out. @buffer_frozen flag prevents
  premature delegation to @io before rewind, so all bytes consumed during
  auto-detection are replayable. each_char falls back to ASCII_8BIT (not
  UTF-8) for sources with no declared encoding.

  Reader: enforce_utf8_encoding was using force_encoding(utf-8) which
  silently dropped non-ASCII bytes from ISO-8859-1, Windows-1252, Shift-JIS
  and other encodings. Now uses line.encoding as the source in encode() for
  correct transcoding.

  Use Encoding::ASCII_8BIT consistently (Encoding::BINARY is an alias).

  Add comprehensive tests: multi-encoding unit tests, transcoding pair
  integration tests, non-seekable stream transcoding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant