Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The parquet scanner implements batch readahead by applying a readahead generator to the generator returned by parquet::arrow::FileReader::GetRecordBatchGenerator. However, that generator is constructed with MakeConcatenatedGenerator which, regrettably, has this comment:
> This generator is async-reentrant but will never pull from source reentrantly and will never pull from any subscription reentrantly.
This effectively prevents any batch readahead from happening and the file is always read one batch at a time. Part of the problem seems to be that ReadOneRowGroup in reader.cc returns a RecordBatchGenerator when it seems it should be able to return a RecordBatch. For the testing I am doing I changed this to return a single record batch which allowed me to get rid of the concatenated generator and batch readahead appeared to work properly but I didn't fully confirm the correctness of this.
Attachments
Attachments
Issue Links
- links to