[ARROW-8250] [C++] Add "random access" / slice read API to RecordBatchFileReader - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: C++
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/24446

Description

If you want to read a small section of a file, it is not possible to easily determine the relevant record batches that need "rehydrating".

I would propose the following:

A way to cheaply read (and cache, so this doesn't have to be done multiple times) all the RecordBatch metadata without deserializing the record batch data structures themselves
Based on the metadata you can then determine the range of batches that need to be rehydrated and then sliced accordingly to produce the Table of interest

This functionality can be lifted into the Feather read APIs also

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Wes McKinney

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Mar/20 00:51

Updated:: 11/Jan/23 07:59