The Hash-Join and Hash-Aggr operators copy each incoming row separately. When the incoming data has a selection vector (e.g., outgoing from a Filter), a SelectionVectorRemover is added before the Hash operator, as the latter cannot handle the selection vector.
Thus every row is needlessly being copied twice!
: Enhance the Hash operators to handle potential incoming selection vectors, thus eliminating the need for the extra copy. The planner needs to be changed not to add that SelectionVectorRemover.
- Note the special case of Hash-Join with num_partitions = 1, where the build side vectors are used as is, not copied.
- Conflicts with the suggestion not to copy probe vectors, in DRILL-5912 )
And the plan: