Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
0.14.0, 0.14.1
-
None
-
None
-
ubuntu 18, 16GB ram, 4 cpus
Description
Method pyarrow.parquet.read_table is very slow and cause RAM spikes from version 0.14.0
Reading a 40MB parquet file takes less than 1 second in versions 0.11, 0.12 and 0.13. wheras it takes from 6 to 30 seconds in versions 0.14.x
This impact in performance is easily measured. However, there is another problem that I could only detect on htop screen. While opening a 40MB parquet, the process occupies almost 16GB for some miliseconds. The pyarrow table will result in around 300MB in the python process (registered using memory-profiler). This does not happens in versions 0.13 and previous ones.
Attachments
Issue Links
- duplicates
-
ARROW-6060 [Python] too large memory cost using pyarrow.parquet.read_table with use_threads=True
- Resolved