-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Labels
enhancementNew feature or requestNew feature or request
Description
The process works fine but consumes a bunch of RAM. I only realised this after getting consistent OOM problems and then added some extra logging statements.
When text files are compressed the polars batch reader decompresses the file in memory and reads it
After digging through some polars issues a good workaround is using pyarrow.csv to stream the CSV
Before we used python's csv module to create pyarrow record batches, which was very slow. pyarrow.csv should be much faster but we'd need to be careful and profile it.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request