Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.14.0
-
None
-
Reviewed
-
VectorReduceSink is emitting incorrect JOIN keys (Navis, via Gopal V)
Description
After HIVE-7121, tpc-h query5 has resulted in incorrect results.
Thanks to navis, it has been tracked down to the auto-parallel settings which were initialized for ReduceSinkOperator, but not for VectorReduceSinkOperator. The vector version inherits, but doesn't call super.initializeOp() or set up the variable correctly from ReduceSinkDesc.
The query is tpc-h query5, with extra NULL checks just to be sure.
ELECT n_name, sum(l_extendedprice * (1 - l_discount)) AS revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= '1994-01-01' AND o_orderdate < '1995-01-01' and l_orderkey is not null and c_custkey is not null and l_suppkey is not null and c_nationkey is not null and s_nationkey is not null and n_regionkey is not null GROUP BY n_name ORDER BY revenue DESC;
The reducer which has the issue has the following plan
Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col2} 1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3} outputColumnNames: _col0, _col3, _col10, _col11, _col14 Statistics: Num rows: 183333344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col10 (type: int) sort order: + Map-reduce partition columns: _col10 (type: int) Statistics: Num rows: 183333344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: int), _col11 (type: int), _col14 (type: string)