Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8733

[C++][Dataset][Python] ParquetFileFragment should provide access to parquet FileMetadata

    XMLWordPrintableJSON

    Details

      Description

      Related to ARROW-8062 (as there we will also need a way to expose the global FileMetadata). But independently, it would be useful to get access to the FileMetadata on each ParquetFileFragment (eg to get access to the statistics).

      This would be relatively simple to code on the Python/R side, since we have access to the file path, and could read the metadata from the file backing the fragment, and return this as a FileMetadata object.

      I am wondering if we want to integrate this with ARROW-8062, since when the fragments were created from a _metadata file, a ParquetFileFragment.metadata attribute would not need to read it from the parquet file in this case, but from the global metadata (at least for eg the row group data).

      Another question: what for a ParquetFileFragment that maps to a single row group?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bkietz Ben Kietzman
                Reporter:
                jorisvandenbossche Joris Van den Bossche
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m