Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.7.0
Description
In the sorter, we codegen the comparator function but call it indirectly via a function pointer. We should consider codegening the perf-critical loops so that we can make the comparator function call direct and inlinable. Inlining the comparison will be very beneficial if it is trivial, e.g. order by a numeric column: I expect sorts on simple keys will get noticably faster.
We should also be able to get rid of FreeLocalAllocations() calls for most comparators, although I'm not sure what the best way to approach that is.
The Partition() loop is the most perf-critical, followed by InsertionSort().
We also don't do this yet for the TopN node, see IMPALA-3815.
Mostafa's analysis:
While evaluating Sort performance I noticed that the codegened compare function is not inlined which results in large overhead per row.
Expected speedup is 10-15%
/// Returns a negative value if lhs is less than rhs, a positive value if lhs is /// greater than rhs, or 0 if they are equal. All exprs (ordering_exprs_lhs_ and /// ordering_exprs_rhs_) must have been prepared and opened before calling this, /// i.e. 'sort_key_exprs' in the constructor must have been opened. int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const { return codegend_compare_fn_ == NULL ? CompareInterpreted(lhs, rhs) : (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(), ordering_expr_evals_rhs_.data(), lhs, rhs); }
From Perf
│ bool Sorter::TupleSorter::Less(const TupleRow* lhs, const TupleRow* rhs) { ▒ 7.43 │ push %rbp ▒ 3.23 │ mov %rsp,%rbp ▒ 9.44 │ push %r12 ▒ 2.69 │ push %rbx ▒ 3.89 │ mov %rsi,%r12 ▒ 2.98 │ mov %rdi,%rbx ▒ 6.06 │ sub $0x10,%rsp ◆ │ --num_comparisons_till_free_; ▒ │ DCHECK_GE(num_comparisons_till_free_, 0); ▒ │ if (UNLIKELY(num_comparisons_till_free_ == 0)) { ▒ 3.75 │ subl $0x1,0x18(%rdi) ▒ 9.42 │ ↓ je 58 ▒ │ parent_->expr_results_pool_.Clear(); ▒ │ num_comparisons_till_free_ = state_->batch_size(); ▒ │ } ▒ │ return comparator_.Less(lhs, rhs); ▒ 1.09 │17: mov 0x10(%rbx),%rdi ▒ │ /// Returns a negative value if lhs is less than rhs, a positive value if lhs is ▒ │ /// greater than rhs, or 0 if they are equal. All exprs (ordering_exprs_lhs_ and ▒ │ /// ordering_exprs_rhs_) must have been prepared and opened before calling this, ▒ │ /// i.e. 'sort_key_exprs' in the constructor must have been opened. ▒ │ int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const { ▒ │ return codegend_compare_fn_ == NULL ? ▒ 2.69 │ mov 0x58(%rdi),%rax ▒ │ CompareInterpreted(lhs, rhs) : ▒ │ (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(), ▒ │ ordering_expr_evals_rhs_.data(), lhs, rhs); ▒ 5.43 │ test %rax,%rax ▒ │ ↓ je 40 ▒ 6.85 │ mov 0x20(%rdi),%rsi ▒ 0.86 │ mov %rdx,%rcx ▒ 2.55 │ mov 0x8(%rdi),%rdi ▒ 3.38 │ mov %r12,%rdx ▒ 6.10 │ → callq *(%rax) ▒ │ } ▒ 5.84 │ add $0x10,%rsp ▒ │ /// All exprs (ordering_exprs_lhs_ and ordering_exprs_rhs_) must have been prepared ▒ │ /// and opened before calling this. ▒ │ /// Force inlining because it tends not to be always inlined at callsites, even in ▒ │ /// hot loops. ▒ │ bool ALWAYS_INLINE Less(const TupleRow* lhs, const TupleRow* rhs) const { ▒ │ return Compare(lhs, rhs) < 0; ▒ 1.77 │ shr $0x1f,%eax ▒ 7.91 │ pop %rbx ▒ 4.11 │ pop %r12 ▒ 0.49 │ pop %rbp ▒ 1.75 │ ← retq ▒ │ /// i.e. 'sort_key_exprs' in the constructor must have been opened. ▒ │ int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const { ▒ │ return codegend_compare_fn_ == NULL ? ▒ │ CompareInterpreted(lhs, rhs) : ▒ │ (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(), ▒ │ ordering_expr_evals_rhs_.data(), lhs, rhs); ▒ │40: mov %r12,%rsi ▒ │ → callq impala::TupleRowComparator::CompareInterpreted(impala::TupleRow const*, impala::TupleRow const*) const ▒ │ add $0x10,%rsp ▒ │ /// All exprs (ordering_exprs_lhs_ and ordering_exprs_rhs_) must have been prepared ▒ Press 'h' for help on key bindings
Attachments
Attachments
Issue Links
- is duplicated by
-
IMPALA-6744 Inline codegend_compare_fn_ to avoid per row memory loads and function call
- Resolved
- is related to
-
IMPALA-4065 Inline comparator calls into TopN::InsertBatch()
- Resolved