Memory improvements by domoritz · Pull Request #138 · okfn/messytables

domoritz · 2015-07-29T04:59:47Z

I just made a pr for a branch from @karol-szuster because these changes seem to be a valuable improvement. Haven't tested them but wanted to document.

When csv file is longer than 1000 lines only first attempt to fetch it content will succeed. Any subsequent attempt will return only first 1000 lines. This patch should fix this problem but it may affect memory footprint.

pudo · 2015-08-06T13:23:01Z

Wow, that should almost be a bug. +1

karol-szuster · 2015-08-11T09:01:03Z

I have created those patches because in my project I'm very often processing very large excel and csv files (larger than 500K lines). Without those patches I was not able to parse some of them because OOM killer was killing python interpreter on machine witch has 4GB of memory. With those patches python processes consumes about 400MB of RAM.
I didn't want to created pr without unit test but if you like those patches then you can take them and they should work find. At least I haven't experienced any issues with them.

headers_guess functions reads entire content of file into memory. This leads to huge memory footprint when working on large excel files. This patch fixes this problem by reading only first 1000 lines of file into memory. This should be sufficient to find the header and saves a lot of memory.

pwalsh · 2016-07-20T19:57:28Z

@pudo would be nice if you have time to check this, as you expressed interest to me. If good, I can rebase and merge.

rufuspollock · 2017-02-07T01:48:54Z

This seem super simple improvements - looking at the patches and LGTM. I guess we need to rebase against master to get these in ...

Csv files longer than 1000 lines are incorrectly handled.

d307bd0

When csv file is longer than 1000 lines only first attempt to fetch it content will succeed. Any subsequent attempt will return only first 1000 lines. This patch should fix this problem but it may affect memory footprint.

Karol Szuster added 2 commits August 25, 2015 13:01

Invalid type guessing when column contains only zeros

4486e3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory improvements#138

Memory improvements#138
domoritz wants to merge 3 commits intookfn:masterfrom
karol-szuster:master

domoritz commented Jul 29, 2015

Uh oh!

pudo commented Aug 6, 2015

Uh oh!

karol-szuster commented Aug 11, 2015

Uh oh!

pwalsh commented Jul 20, 2016

Uh oh!

rufuspollock commented Feb 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

domoritz commented Jul 29, 2015

Uh oh!

pudo commented Aug 6, 2015

Uh oh!

karol-szuster commented Aug 11, 2015

Uh oh!

pwalsh commented Jul 20, 2016

Uh oh!

rufuspollock commented Feb 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants