Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8250

[C++] Add "random access" / slice read API to RecordBatchFileReader

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++
    • None

    Description

      If you want to read a small section of a file, it is not possible to easily determine the relevant record batches that need "rehydrating".

      I would propose the following:

      • A way to cheaply read (and cache, so this doesn't have to be done multiple times) all the RecordBatch metadata without deserializing the record batch data structures themselves
      • Based on the metadata you can then determine the range of batches that need to be rehydrated and then sliced accordingly to produce the Table of interest

      This functionality can be lifted into the Feather read APIs also

      Attachments

        Activity

          People

            Unassigned Unassigned
            wesm Wes McKinney
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: