Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5284 Roll-up of final fixes for managed sort
  3. DRILL-5285

Provide detailed, accurate estimate of size consumed by a record batch

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.10.0
    • 1.11.0
    • None
    • None

    Description

      DRILL-5080 introduced a RecordBatchSizer that estimates the space taken by a record batch and determines batch "density."

      Drill provides a large variety of vectors, each with their own internal structure and collections of vectors. For example, fixed vectors use just a data vector. Nullable vectors add an "is set" vector. Variable length vectors add an offset vector. Repeated vectors add a second offset vector.

      The original RecordBatchSizer attempted to compute sizes for all these vector types. But, the complexity got to be out of hand. This ticket requests to simply bite the bullet and move the calculations into each vector type so that the RecordBatchSizer can simply use the results of the calculations.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            paul-rogers Paul Rogers
            paul-rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment