Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
The Substrait format allows for "sliced reads" which only read a part of the file and would most likely be used if a read operation were distributed across multiple files.
For each file a start byte and length is specified. For files that contain indivisible "groups" (e.g. Parquet row groups) this is handled by picking some heuristic. For example, read all row groups whose midpoint is contained in the interval.