Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7232

VectorReduceSink is emitting incorrect JOIN keys

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.14.0
    • 0.14.0
    • Query Processor
    • None
    • Reviewed
    • VectorReduceSink is emitting incorrect JOIN keys (Navis, via Gopal V)

    Description

      After HIVE-7121, tpc-h query5 has resulted in incorrect results.

      Thanks to navis, it has been tracked down to the auto-parallel settings which were initialized for ReduceSinkOperator, but not for VectorReduceSinkOperator. The vector version inherits, but doesn't call super.initializeOp() or set up the variable correctly from ReduceSinkDesc.

      The query is tpc-h query5, with extra NULL checks just to be sure.

      ELECT n_name,
             sum(l_extendedprice * (1 - l_discount)) AS revenue
      FROM customer,
           orders,
           lineitem,
           supplier,
           nation,
           region
      WHERE c_custkey = o_custkey
        AND l_orderkey = o_orderkey
        AND l_suppkey = s_suppkey
        AND c_nationkey = s_nationkey
        AND s_nationkey = n_nationkey
        AND n_regionkey = r_regionkey
        AND r_name = 'ASIA'
        AND o_orderdate >= '1994-01-01'
        AND o_orderdate < '1995-01-01'
        and l_orderkey is not null
        and c_custkey is not null
        and l_suppkey is not null
        and c_nationkey is not null
        and s_nationkey is not null
        and n_regionkey is not null
      GROUP BY n_name
      ORDER BY revenue DESC;
      

      The reducer which has the issue has the following plan

      Reducer 3
                  Reduce Operator Tree:
                    Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {KEY.reducesinkkey0} {VALUE._col2}
                        1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
                      outputColumnNames: _col0, _col3, _col10, _col11, _col14
                      Statistics: Num rows: 183333344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col10 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col10 (type: int)
                        Statistics: Num rows: 183333344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE
                        value expressions: _col0 (type: int), _col3 (type: int), _col11 (type: int), _col14 (type: string)
      

      Attachments

        1. HIVE-7232.2.patch.txt
          17 kB
          Gopal Vijayaraghavan
        2. HIVE-7232.1.patch.txt
          17 kB
          Navis Ryu
        3. q5.sql
          0.6 kB
          Gopal Vijayaraghavan
        4. q5.explain.txt
          14 kB
          Gopal Vijayaraghavan
        5. HIVE-7232-extra-logging.patch
          3 kB
          Gopal Vijayaraghavan

        Activity

          People

            gopalv Gopal Vijayaraghavan
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: