Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2681

Improve Decimal arithmetic performance by using a cheaper overflow check

    XMLWordPrintableJSON

Details

    Description

      Profile shows that 25% of CPU for TPC-H Q1 is spent in __divti3, the division is part of the Decimal overflow check in ./be/src/runtime/decimal-value.h

        template<typename RESULT_T>
        DecimalValue<RESULT_T> Multiply(int this_scale, const DecimalValue& other,
            int other_scale, int result_scale, bool* overflow) const {
          // In the non-overflow case, we don't need to adjust by the scale since
          // that is already handled by the FE when it computes the result decimal type.
          // e.g. 1.23 * .2 (scale 2, scale 1 respectively) is identical to:
          // 123 * 2 with a resulting scale 3. We can do the multiply on the unscaled values.
          // The result scale in this case is the sum of the input scales.
          RESULT_T x = value();
          RESULT_T y = other.value();
          if (x == 0 || y == 0) {
            // Handle zero to avoid divide by zero in the overflow check below.
            return DecimalValue<RESULT_T>(0);
          }
          if (sizeof(RESULT_T) == 16) {
            // Check overflow
            *overflow |= DecimalUtil::MAX_UNSCALED_DECIMAL / abs(y) < abs(x);
          }
      
      Data Of Interest (CPU Metrics)
      1 of 2: 68.5% (6.414s of 9.362s)
      
      libgcc_s.so.1!__divti3 - [Unknown]
      impalad!Multiply<__int128>+0x69 - decimal-value.h:222
      impalad!impala::DecimalOperators::Multiply_DecimalVal_DecimalVal+0x1b4 - decimal-operators.cc:687
      impalad!impala::ScalarFnCall::InterpretEval<impala_udf::DecimalVal>+0x3e1 - scalar-fn-call.cc:546
      impalad!impala::ScalarFnCall::GetDecimalVal+0x2f - scalar-fn-call.cc:743
      impalad!impala::ExprContext::GetValue+0x225 - expr-context.cc:276
      impalad!impala::AggFnEvaluator::Update+0x91 - agg-fn-evaluator.cc:346
      impalad!impala::AggFnEvaluator::Add+0x11 - agg-fn-evaluator.h:241
      impalad!impala::PartitionedAggregationNode::UpdateTuple+0x40 - partitioned-aggregation-node.cc:732
      
      select
      	l_returnflag,
      	l_linestatus,
      	sum(l_quantity) as sum_qty,
      	sum(l_extendedprice) as sum_base_price,
      	sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
      	sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
      	avg(l_quantity) as avg_qty,
      	avg(l_extendedprice) as avg_price,
      	avg(l_discount) as avg_disc,
      	count(*) as count_order
      from
      	lineitem
      where
      	l_shipdate <= '1998-09-16'
      group by
      	l_returnflag,
      	l_linestatus
      order by
      	l_returnflag,
      	l_linestatus;
      

      Decimal performance is a popular ask amongst financial institutions.

      Attachments

        Issue Links

          Activity

            People

              zuowang_impala_c24e Zuo Wang
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: