Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
We have a unit test
https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/arrow-reader-writer-test.cc#L933
that reads 1 record at a time from a Parquet-Arrow column reader. There is logic on RecordReader that advances the definition/repetition levels based on consumed data from previous records, but this is inefficient for this case:
https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_reader.cc#L1011
This should be refactored to not require this copying, or at least to only "shift" the levels occasionally