Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
0.14.0
Description
I'm using pyarrow to read a 40MB parquet file.
When reading all of the columns besides the "body" columns, the process peaks at 170MB.
Reading only the "body" column results in over 6GB of memory used.
I made the file publicly accessible: s3://dhavivresearch/pyarrow/demofile.parquet
Attachments
Issue Links
- duplicates
-
ARROW-6060 [Python] too large memory cost using pyarrow.parquet.read_table with use_threads=True
- Resolved
- is caused by
-
ARROW-6060 [Python] too large memory cost using pyarrow.parquet.read_table with use_threads=True
- Resolved
- relates to
-
ARROW-3772 [C++] Read Parquet dictionary encoded ColumnChunks directly into an Arrow DictionaryArray
- Resolved