Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.9.0
-
Reproduced on Ubuntu + Mac OSX
Description
When writing large Parquet files (above 10 GB or so) from Pandas to Parquet via the command
pq.write_table(my_df, 'table.parquet')
The write succeeds, but when the parquet file is loaded, the error message
ArrowIOError: Invalid parquet file. Corrupt footer.
appears. This same error occurs when the parquet file is written chunkwise as well. When the parquet files are small, say < 5 GB or so (drawn randomly from the same dataset), everything proceeds as normal. I've also tried this with Pandas df.to_parquet(), with the same results.
Update: Looks like any DataFrame with size above ~5GB (on disk) returns the same error.
Attachments
Attachments
Issue Links
- supercedes
-
ARROW-2372 [Python] ArrowIOError: Invalid argument when reading Parquet file
- Resolved
- links to