Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-14026

[C++] Batch readahead not working correctly in Parquet scanner

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The parquet scanner implements batch readahead by applying a readahead generator to the generator returned by parquet::arrow::FileReader::GetRecordBatchGenerator. However, that generator is constructed with MakeConcatenatedGenerator which, regrettably, has this comment:

      > This generator is async-reentrant but will never pull from source reentrantly and will never pull from any subscription reentrantly.

      This effectively prevents any batch readahead from happening and the file is always read one batch at a time. Part of the problem seems to be that ReadOneRowGroup in reader.cc returns a RecordBatchGenerator when it seems it should be able to return a RecordBatch. For the testing I am doing I changed this to return a single record batch which allowed me to get rid of the concatenated generator and batch readahead appeared to work properly but I didn't fully confirm the correctness of this.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            lidavidm David Li
            westonpace Weston Pace
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h
                4h

                Slack

                  Issue deployment