Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Distribution of rows is skewed in DHJ causing slowdown.
Same RS outputs, but the two branches use VectorReduceSinkObjectHashOperator and VectorReduceSinkLongOperator.
| Select Operator | | expressions: ws_warehouse_sk (type: bigint), ws_order_number (type: bigint) | | outputColumnNames: _col0, _col1 | | Select Vectorization: | | className: VectorSelectOperator | | native: true | | projectedOutputColumnNums: [14, 16] | | Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE | | Reduce Output Operator | | key expressions: _col1 (type: bigint) | | sort order: + | | Map-reduce partition columns: _col1 (type: bigint) | | Reduce Sink Vectorization: | | className: VectorReduceSinkObjectHashOperator | | keyColumnNums: [16] | | native: true | | nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true | | partitionColumnNums: [16] | | valueColumnNums: [14] | +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE | | value expressions: _col0 (type: bigint) | | Reduce Output Operator | | key expressions: _col1 (type: bigint) | | sort order: + | | Map-reduce partition columns: _col1 (type: bigint) | | Reduce Sink Vectorization: | | className: VectorReduceSinkLongOperator | | keyColumnNums: [16] | | native: true | | nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true | | valueColumnNums: [14] | | Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE | | value expressions: _col0 (type: bigint) | | Execution mode: vectorized, llap |