Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8733

[C++][Dataset][Python] ParquetFileFragment should provide access to parquet FileMetadata

    XMLWordPrintableJSON

Details

    Description

      Related to ARROW-8062 (as there we will also need a way to expose the global FileMetadata). But independently, it would be useful to get access to the FileMetadata on each ParquetFileFragment (eg to get access to the statistics).

      This would be relatively simple to code on the Python/R side, since we have access to the file path, and could read the metadata from the file backing the fragment, and return this as a FileMetadata object.

      I am wondering if we want to integrate this with ARROW-8062, since when the fragments were created from a _metadata file, a ParquetFileFragment.metadata attribute would not need to read it from the parquet file in this case, but from the global metadata (at least for eg the row group data).

      Another question: what for a ParquetFileFragment that maps to a single row group?

      Attachments

        Issue Links

          Activity

            People

              bkietz Ben Kietzman
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m