Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Later
-
Impala 2.6.0
-
None
Description
void Sorter::TupleSorter::SortHelper(TupleIterator first, TupleIterator last) { if (UNLIKELY(state_->is_cancelled())) return; // Use insertion sort for smaller sequences. while (last.index_ - first.index_ > INSERTION_THRESHOLD) { TupleIterator iter(this, first.index_ + (last.index_ - first.index_) / 2); DCHECK(iter.current_tuple_ != NULL); // Partition() splits the tuples in [first, last) into two groups (<= pivot // and >= pivot) in-place. 'cut' is the index of the first tuple in the second group. TupleIterator cut = Partition(first, last, reinterpret_cast<Tuple*>(iter.current_tuple_)); SortHelper(cut, last); last = cut; if (UNLIKELY(state_->is_cancelled())) return; } InsertionSort(first, last); }
The quicksort implementation in the sorter is based on dividing the input into two partitions: <= pivot and >= pivot.
If all of the input values in a partition are equal, then it will still recursively divide and do insertion sort on the values. We could change the sorter to partition the input into three partitions: <, == and >. Then it doesn't need to recurse on the middle partition. This would mean it could sort a partition full of duplicate values in a single pass over the input.
Attachments
Issue Links
- is duplicated by
-
IMPALA-10961 Implement 3-way quicksort in sorter
- Resolved
- is related to
-
IMPALA-6044 Analytic planner misses some opportunities to merge partition groups
- Open
- relates to
-
IMPALA-3354 Sorter can overflow stack on large sorts
- Resolved