Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9299

[Python] Expose ORC metadata() in Python ORCFile

    XMLWordPrintableJSON

Details

    Description

      There is currently no way for a user to directly access the underlying ORC metadata of a given file. It seems the C++ functions and objects already existing and rather the plumbing is just missing the the cython/python and potentially a few c++ shims. Giving users the ability to retrieve the metadata without first reading the entire file could help numerous applications to increase their query performance by allowing them to intelligently determine which ORC stripes should be read.  

      This would allow for something like 

      import pyarrow as pa 
      orc_metadata = pa.orc.ORCFile(filename).metadata()
      

      Attachments

        Issue Links

          Activity

            People

              yingzhou474 Ian Alexander Joiner
              jeremy.dyer Jeremy Dyer
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3.5h
                  3.5h