Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3816

Codegen perf-critical loops in Sorter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 2.7.0
    • Impala 4.0.0
    • Backend

    Description

      In the sorter, we codegen the comparator function but call it indirectly via a function pointer. We should consider codegening the perf-critical loops so that we can make the comparator function call direct and inlinable. Inlining the comparison will be very beneficial if it is trivial, e.g. order by a numeric column: I expect sorts on simple keys will get noticably faster.

      We should also be able to get rid of FreeLocalAllocations() calls for most comparators, although I'm not sure what the best way to approach that is.

      The Partition() loop is the most perf-critical, followed by InsertionSort().

      We also don't do this yet for the TopN node, see IMPALA-3815.

      Mostafa's analysis:

      While evaluating Sort performance I noticed that the codegened compare function is not inlined which results in large overhead per row.

      Expected speedup is 10-15%

        /// Returns a negative value if lhs is less than rhs, a positive value if lhs is
        /// greater than rhs, or 0 if they are equal. All exprs (ordering_exprs_lhs_ and
        /// ordering_exprs_rhs_) must have been prepared and opened before calling this,
        /// i.e. 'sort_key_exprs' in the constructor must have been opened.
        int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const {
          return codegend_compare_fn_ == NULL ?
              CompareInterpreted(lhs, rhs) :
              (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(),
                  ordering_expr_evals_rhs_.data(), lhs, rhs);
        } 
      

      From Perf

            │    bool Sorter::TupleSorter::Less(const TupleRow* lhs, const TupleRow* rhs) {                                                                                                                                                           ▒
        7.43 │      push   %rbp                                                                                                                                                                                                                        ▒
        3.23 │      mov    %rsp,%rbp                                                                                                                                                                                                                   ▒
        9.44 │      push   %r12                                                                                                                                                                                                                        ▒
        2.69 │      push   %rbx                                                                                                                                                                                                                        ▒
        3.89 │      mov    %rsi,%r12                                                                                                                                                                                                                   ▒
        2.98 │      mov    %rdi,%rbx                                                                                                                                                                                                                   ▒
        6.06 │      sub    $0x10,%rsp                                                                                                                                                                                                                  ◆
             │      --num_comparisons_till_free_;                                                                                                                                                                                                      ▒
             │      DCHECK_GE(num_comparisons_till_free_, 0);                                                                                                                                                                                          ▒
             │      if (UNLIKELY(num_comparisons_till_free_ == 0)) {                                                                                                                                                                                   ▒
        3.75 │      subl   $0x1,0x18(%rdi)                                                                                                                                                                                                             ▒
        9.42 │    ↓ je     58                                                                                                                                                                                                                          ▒
             │        parent_->expr_results_pool_.Clear();                                                                                                                                                                                             ▒
             │        num_comparisons_till_free_ = state_->batch_size();                                                                                                                                                                               ▒
             │      }                                                                                                                                                                                                                                  ▒
             │      return comparator_.Less(lhs, rhs);                                                                                                                                                                                                 ▒
        1.09 │17:   mov    0x10(%rbx),%rdi                                                                                                                                                                                                             ▒
             │      /// Returns a negative value if lhs is less than rhs, a positive value if lhs is                                                                                                                                                   ▒
      /// greater than rhs, or 0 if they are equal. All exprs (ordering_exprs_lhs_ and                                                                                                                                                   ▒
      /// ordering_exprs_rhs_) must have been prepared and opened before calling this,                                                                                                                                                   ▒
      /// i.e. 'sort_key_exprs' in the constructor must have been opened.                                                                                                                                                                ▒
      int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const {                                                                                                                                                        ▒
             │        return codegend_compare_fn_ == NULL ?                                                                                                                                                                                            ▒
        2.69 │      mov    0x58(%rdi),%rax                                                                                                                                                                                                             ▒
             │            CompareInterpreted(lhs, rhs) :                                                                                                                                                                                               ▒
             │            (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(),                                                                                                                                                                     ▒
             │                ordering_expr_evals_rhs_.data(), lhs, rhs);                                                                                                                                                                              ▒
        5.43 │      test   %rax,%rax                                                                                                                                                                                                                   ▒
             │    ↓ je     40                                                                                                                                                                                                                          ▒
        6.85 │      mov    0x20(%rdi),%rsi                                                                                                                                                                                                             ▒
        0.86 │      mov    %rdx,%rcx                                                                                                                                                                                                                   ▒
        2.55 │      mov    0x8(%rdi),%rdi                                                                                                                                                                                                              ▒
        3.38 │      mov    %r12,%rdx                                                                                                                                                                                                                   ▒
        6.10 │    → callq  *(%rax)                                                                                                                                                                                                                     ▒
             │    }                                                                                                                                                                                                                                    ▒
        5.84 │      add    $0x10,%rsp                                                                                                                                                                                                                  ▒
             │      /// All exprs (ordering_exprs_lhs_ and ordering_exprs_rhs_) must have been prepared                                                                                                                                                ▒
      /// and opened before calling this.                                                                                                                                                                                                ▒
      /// Force inlining because it tends not to be always inlined at callsites, even in                                                                                                                                                 ▒
      /// hot loops.                                                                                                                                                                                                                     ▒
             │      bool ALWAYS_INLINE Less(const TupleRow* lhs, const TupleRow* rhs) const {                                                                                                                                                          ▒
             │        return Compare(lhs, rhs) < 0;                                                                                                                                                                                                    ▒
        1.77 │      shr    $0x1f,%eax                                                                                                                                                                                                                  ▒
        7.91 │      pop    %rbx                                                                                                                                                                                                                        ▒
        4.11 │      pop    %r12                                                                                                                                                                                                                        ▒
        0.49 │      pop    %rbp                                                                                                                                                                                                                        ▒
        1.75 │    ← retq                                                                                                                                                                                                                               ▒
             │      /// i.e. 'sort_key_exprs' in the constructor must have been opened.                                                                                                                                                                ▒
      int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const {                                                                                                                                                        ▒
             │        return codegend_compare_fn_ == NULL ?                                                                                                                                                                                            ▒
             │            CompareInterpreted(lhs, rhs) :                                                                                                                                                                                               ▒
             │            (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(),                                                                                                                                                                     ▒
             │                ordering_expr_evals_rhs_.data(), lhs, rhs);                                                                                                                                                                              ▒
             │40:   mov    %r12,%rsi                                                                                                                                                                                                                   ▒
             │    → callq  impala::TupleRowComparator::CompareInterpreted(impala::TupleRow const*, impala::TupleRow const*) const                                                                                                                      ▒
             │      add    $0x10,%rsp                                                                                                                                                                                                                  ▒
             │      /// All exprs (ordering_exprs_lhs_ and ordering_exprs_rhs_) must have been prepared                                                                                                                                                ▒
      Press 'h' for help on key bindings
      

      Attachments

        1. 7e455a394bf2622b_adcea7ad00000000_opt.ll
          345 kB
          Qifan Chen
        2. tpch_30.txt
          4 kB
          Tianyi Wang
        3. percentile query profile.txt
          30 kB
          Tim Armstrong

        Issue Links

          Activity

            People

              sql_forever Qifan Chen
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: