Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20954

Vector RS operator is not using uniform hash function for TPC-DS query 95

    XMLWordPrintableJSON

Details

    Description

      Distribution of rows is skewed in DHJ causing slowdown.

      Same RS outputs, but the two branches use VectorReduceSinkObjectHashOperator and VectorReduceSinkLongOperator.

      |                     Select Operator                |
      |                       expressions: ws_warehouse_sk (type: bigint), ws_order_number (type: bigint) |
      |                       outputColumnNames: _col0, _col1 |
      |                       Select Vectorization:        |
      |                           className: VectorSelectOperator |
      |                           native: true             |
      |                           projectedOutputColumnNums: [14, 16] |
      |                       Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
      |                       Reduce Output Operator       |
      |                         key expressions: _col1 (type: bigint) |
      |                         sort order: +              |
      |                         Map-reduce partition columns: _col1 (type: bigint) |
      |                         Reduce Sink Vectorization: |
      |                             className: VectorReduceSinkObjectHashOperator |
      |                             keyColumnNums: [16]    |
      |                             native: true           |
      |                             nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true |
      |                             partitionColumnNums: [16] |
      |                             valueColumnNums: [14]  |
      +----------------------------------------------------+
      |                      Explain                       |
      +----------------------------------------------------+
      |                         Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
      |                         value expressions: _col0 (type: bigint) |
      |                       Reduce Output Operator       |
      |                         key expressions: _col1 (type: bigint) |
      |                         sort order: +              |
      |                         Map-reduce partition columns: _col1 (type: bigint) |
      |                         Reduce Sink Vectorization: |
      |                             className: VectorReduceSinkLongOperator |
      |                             keyColumnNums: [16]    |
      |                             native: true           |
      |                             nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true |
      |                             valueColumnNums: [14]  |
      |                         Statistics: Num rows: 7199963324 Data size: 115185006696 Basic stats: COMPLETE Column stats: COMPLETE |
      |                         value expressions: _col0 (type: bigint) |
      |             Execution mode: vectorized, llap       |
      

      Attachments

        1. HIVE-20954.1.patch
          139 kB
          Teddy Choi
        2. HIVE-20954.2.patch
          147 kB
          Teddy Choi
        3. HIVE-20954.3.patch
          147 kB
          Teddy Choi

        Activity

          People

            teddy.choi Teddy Choi
            teddy.choi Teddy Choi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m