Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2057

[Python] Configure size of data pages in pyarrow.parquet.write_table

    XMLWordPrintableJSON

    Details

      Description

      It would be useful to be able to set the size of data pages (within Parquet column chunks) from Python. The current default is set to 1MiB at https://github.com/apache/parquet-cpp/blob/0875e43010af485e1c0b506d77d7e0edc80c66cc/src/parquet/properties.h#L81. It might be useful in some situations to lower this for more granular access.

      We should provide this value as a parameter to pyarrow.parquet.write_table.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wesm Wes McKinney
                Reporter:
                wesm Wes McKinney
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m