Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Currently, when chunk_size is not given in write_parquet() it defaults to 1 chunk of all of the rows in the table. This could be fine for smallish numbers of rows, but when we have lots of rows, we want files that contain a decent number of row groups.
It looks like this was added in https://github.com/apache/arrow/pull/5451 and wasn't discussed there, so this default might not be intentional.
Attachments
Issue Links
- links to