Direct reader universal line endings#640
Merged
Merged
Conversation
PositionAwareStreamReaderDirect now detects the actual terminator per line instead of guessing one constant for the whole file.
It scans with IndexOfAny(SearchValues.Create("\r\n")) and classifies each hit as \n, \r\n, or a bare \r (classic-Mac), including the \r-at-block-boundary straddle.
Byte Position advances by the content bytes plus the real terminator's bytes, so it stays exact on files that interleave line endings.
This folds the two capabilities that previously only System/Legacy had into the default reader:
- bare \r no longer renders a classic-Mac file as one giant line
- Position no longer drifts on mixed \n/\r\n files, keeping seeks into flushed buffers correct
The guessed _newLineSequenceLength field and GuessNewLineSequenceLength (with its seek-reset-reread) are replaced by a lazy EnsureInitialized that fills the
first block from the current stream position, plus a ResetReader override that resets scan state on a mid-stream seek (also fixing a latent stale-block scan after a Position change).
Tests: 11 new TDD cases covering bare \r, mixed endings, repeated terminators, trailing \r and \r\n, \n\r, multibyte exact positions, and \r / \r\n landing exactly on the 32 KB block boundary. Full reader suite green (97 tests);
Direct throughput benchmark shows no regression.
Adds byte-exact manual/GUI fixtures (LF, CRLF, CR, Mixed) under TestData with a regenerator and README, pinned `binary` in .gitattributes so core.autocrlf cannot corrupt their terminators.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Handle all line endings exactly in Direct stream reader
PositionAwareStreamReaderDirect now detects the actual terminator per line instead of guessing one constant for the whole file.
It scans with IndexOfAny(SearchValues.Create("\r\n")) and classifies each hit as \n, \r\n, or a bare \r (classic-Mac), including the \r-at-block-boundary straddle.
Byte Position advances by the content bytes plus the real terminator's bytes, so it stays exact on files that interleave line endings.
This folds the two capabilities that previously only System/Legacy had into the default reader:
The guessed _newLineSequenceLength field and GuessNewLineSequenceLength (with its seek-reset-reread) are replaced by a lazy EnsureInitialized that fills the
first block from the current stream position, plus a ResetReader override that resets scan state on a mid-stream seek (also fixing a latent stale-block scan after a Position change).
Tests: 11 new TDD cases covering bare \r, mixed endings, repeated terminators, trailing \r and \r\n, \n\r, multibyte exact positions, and \r / \r\n landing exactly on the 32 KB block boundary. Full reader suite green (97 tests);
Direct throughput benchmark shows no regression.
Adds byte-exact manual/GUI fixtures (LF, CRLF, CR, Mixed) under TestData with a regenerator and README, pinned
binaryin .gitattributes so core.autocrlf cannot corrupt their terminators.