Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1983

[Python] Add ability to write parquet `_metadata` file

    XMLWordPrintableJSON

Details

    Description

      Currently pyarrow.parquet can only write the _common_metadata file (mostly just schema information). It would be useful to add the ability to write a _metadata file as well. This should include information about each row group in the dataset, including summary statistics. Having this summary file would allow filtering of row groups without needing to access each file beforehand.

      This would require that the user is able to get the written RowGroups out of a pyarrow.parquet.write_table call and then give these objects as a list to new function that then passes them on as C++ objects to parquet-cpp that generates the respective _metadata file.

      Attachments

        Issue Links

          Activity

            People

              rjzamora Rick Zamora
              jim.crist Jim Crist
              Votes:
              2 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 9.5h
                  9.5h