Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6211

Optimizations for SelectionVectorRemover

    Details

      Description

      Currently, when a SelectionVectorRemover receives a record batch from an upstream operator (like a Filter), it immediately starts copying over records into a new outgoing batch.
      It can be worthwhile if the RecordBatch can be enriched with some additional summary statistics about the attached SelectionVector, such as

      1. number of records that need to be removed/copied
      2. total number of records in the record-batch

      The benefit of this would be that in extreme cases, if all the records in a batch need to be either truncated or copies, the SelectionVectorRemover can simply drop the record-batch or simply forward it to the next downstream operator.

      While the extreme cases of simply dropping the batch kind of works (because there is no overhead in copying), for cases where the record batch should pass through, the overhead remains (and is actually more than 35% of the time, if you discount for the streaming agg cost within the tests).

      Here are the statistics of having such an optimization

      Selectivity Query Time %Time used by SVR Time Profile
      0% 6.996 0.13% 0.0090948 255d264c-f55e-b343-0bef-49d3e672d93f.sys.drill
      10% 7.836 7.97% 0.6245292 255d2682-8481-bed0-fc22-197a75371c04.sys.drill
      50% 11.225 25.59% 2.8724775 255d2664-2418-19e0-00ea-2076a06572a2.sys.drill
      90% 14.966 33.91% 5.0749706 255d26ae-2c0b-6cd6-ae71-4ad04c992daf.sys.drill
      100% 19.003 35.73% 6.7897719 255d2880-48a2-d86b-5410-29ce0cd249ed.sys.drill

      To summarize, the SVR should avoid creating new batches as much as possible.

      A more generic (non-trivial) optimization should take into account the fact that multiple batches emitted can be coalesced, but we don't currently have test metrics for that.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                karthikm Karthikeyan Manivannan
                Reporter:
                kkhatua Kunal Khatua
                Reviewer:
                Aman Sinha
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: