Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Related to ARROW-8062 (as there we will also need a way to expose the global FileMetadata). But independently, it would be useful to get access to the FileMetadata on each ParquetFileFragment (eg to get access to the statistics).
This would be relatively simple to code on the Python/R side, since we have access to the file path, and could read the metadata from the file backing the fragment, and return this as a FileMetadata object.
I am wondering if we want to integrate this with ARROW-8062, since when the fragments were created from a _metadata file, a ParquetFileFragment.metadata attribute would not need to read it from the parquet file in this case, but from the global metadata (at least for eg the row group data).
Another question: what for a ParquetFileFragment that maps to a single row group?
Attachments
Issue Links
- is related to
-
ARROW-8062 [C++][Dataset] Parquet Dataset factory from a _metadata/_common_metadata file
- Resolved
- links to