Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16719

[Python] Add path/URI + filesystem handling to parquet.read_metadata

    XMLWordPrintableJSON

Details

    Description

      Currently you can pass a local file path or file-like object, or a URI (eg "s3://...") or path+filesystem combo to parquet.read_table.
      But the parquet.read_metadata and parquet.read_schema methods (being a small wrapper around ParquetFile only accept the local file path or file-like object. I would propose to add the same path+filesystem handling to those functions as happens in read_table to make the capabilities of those consistent.

      (I ran into this in geopandas, where we use read_table to read the actual data, but also need read_metadata to inspect the actual Parquet FileMetaData for metadata)

      Attachments

        Issue Links

          Activity

            People

              kshitij12345 Kshiteej K
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 7h
                  7h