Memory improvements#138
Conversation
When csv file is longer than 1000 lines only first attempt to fetch it content will succeed. Any subsequent attempt will return only first 1000 lines. This patch should fix this problem but it may affect memory footprint.
|
Wow, that should almost be a bug. +1 |
|
I have created those patches because in my project I'm very often processing very large excel and csv files (larger than 500K lines). Without those patches I was not able to parse some of them because OOM killer was killing python interpreter on machine witch has 4GB of memory. With those patches python processes consumes about 400MB of RAM. |
headers_guess functions reads entire content of file into memory. This leads to huge memory footprint when working on large excel files. This patch fixes this problem by reading only first 1000 lines of file into memory. This should be sufficient to find the header and saves a lot of memory.
|
@pudo would be nice if you have time to check this, as you expressed interest to me. If good, I can rebase and merge. |
|
This seem super simple improvements - looking at the patches and LGTM. I guess we need to rebase against master to get these in ... |
I just made a pr for a branch from @karol-szuster because these changes seem to be a valuable improvement. Haven't tested them but wanted to document.