Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3907

Enable ScanBatch to provide a merge on pre-sorted readers

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • Future
    • Storage - Other
    • None

    Description

      In some situations, individual record readers will be presorted by a key. Given Drill's approach to parallelization, it is possible that a Single ScanBatch will be interacting with multiple readers. If we want to maintain the collation of the underlying data, we need to Drill to do a n-way merge on the streams as they are read into Drill. This functionality already exists in the MergingReceiver.

      This JIRA is to refactor merging receiver so that the underlying N-Way merge of batches can be used in other locations. We then need to decide whether to incorporate it directly into the ScanBatch (when needed) or to do something external. We also need to resolve how we decide whether the collation that could be provided by utilizing an n-way merge is necessary (to avoid paying the cost of maintaining an unused collation).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            jnadeau Jacques Nadeau

            Dates

              Created:
              Updated:

              Slack

                Issue deployment