Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.14.1
Description
I tried to load a parquet file of about 1.8Gb using the following code. It crashed due to out of memory issue.
import pyarrow.parquet as pq pq.read_table('/tmp/test.parquet')
However, it worked well with use_threads=True as follows
pq.read_table('/tmp/test.parquet', use_threads=False)
If pyarrow is downgraded to 0.12.1, there is no such problem.
Attachments
Issue Links
- causes
-
ARROW-5993 [Python] Reading a dictionary column from Parquet results in disproportionate memory usage
- Closed
- is duplicated by
-
ARROW-5993 [Python] Reading a dictionary column from Parquet results in disproportionate memory usage
- Closed
-
ARROW-6380 Method pyarrow.parquet.read_table has memory spikes from version 0.14
- Closed
- is related to
-
ARROW-6230 [R] Reading in Parquet files are 20x slower than reading fst files in R
- Resolved
- relates to
-
ARROW-6059 [Python] Regression memory issue when calling pandas.read_parquet
- Closed
- links to