[HIVE-26365] Remove column statistics collection task from merge statement plan - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0.0-alpha-2
Component/s: None
Labels:
- pull-request-available

Description

Merge statements may contain delete and update branches. Update is technically a delete and an insert operation. Column statistics can not be calculated in case of delete operations from the changed records. Example: min, max.

Currently Hive marks the column stats of the target table invalid after Update/Delete/Merge but for merge extra GBY operators and reducers are generated for insert branches to calculate column stats and Stats works are collecting Column stats too.

POSTHOOK: query: explain
merge into acidTbl_n0 as t using nonAcidOrcTbl_n0 s ON t.a = s.a
WHEN MATCHED AND s.a > 8 THEN DELETE
WHEN MATCHED THEN UPDATE SET b = 7
WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b)
POSTHOOK: type: QUERY
POSTHOOK: Input: default@acidtbl_n0
POSTHOOK: Input: default@nonacidorctbl_n0
POSTHOOK: Output: default@acidtbl_n0
POSTHOOK: Output: default@acidtbl_n0
POSTHOOK: Output: default@merge_tmp_table
STAGE DEPENDENCIES:
  Stage-5 is a root stage
  Stage-6 depends on stages: Stage-5
  Stage-0 depends on stages: Stage-6
  Stage-7 depends on stages: Stage-0
  Stage-1 depends on stages: Stage-6
  Stage-8 depends on stages: Stage-1
  Stage-2 depends on stages: Stage-6
  Stage-9 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-6
  Stage-10 depends on stages: Stage-3
  Stage-4 depends on stages: Stage-6
  Stage-11 depends on stages: Stage-4

STAGE PLANS:
  Stage: Stage-5
    Tez
#### A masked pattern was here ####
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 10 (SIMPLE_EDGE)
        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
        Reducer 4 <- Reducer 2 (SIMPLE_EDGE)
        Reducer 5 <- Reducer 2 (SIMPLE_EDGE)
        Reducer 6 <- Reducer 5 (CUSTOM_SIMPLE_EDGE)
        Reducer 7 <- Reducer 2 (SIMPLE_EDGE)
        Reducer 8 <- Reducer 7 (CUSTOM_SIMPLE_EDGE)
        Reducer 9 <- Reducer 2 (SIMPLE_EDGE)
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: s
                  Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: a (type: int), b (type: int)
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: int)
                      null sort order: z
                      sort order: +
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col1 (type: int)
            Execution mode: vectorized, llap
            LLAP IO: all inputs
        Map 10 
            Map Operator Tree:
                TableScan
                  alias: t
                  filterExpr: a is not null (type: boolean)
                  Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: a is not null (type: boolean)
                    Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: a (type: int), ROW__ID (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 2 Data size: 160 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        null sort order: z
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 2 Data size: 160 Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col1 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
            Execution mode: vectorized, llap
            LLAP IO: may be used (ACID table)
        Reducer 2 
            Execution mode: llap
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Left Outer Join 0 to 1
                keys:
                  0 _col0 (type: int)
                  1 _col0 (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 6 Data size: 288 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: _col3 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>), _col1 (type: int), _col2 (type: int), _col0 (type: int)
                  outputColumnNames: _col0, _col1, _col2, _col3
                  Statistics: Num rows: 6 Data size: 288 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((_col2 = _col3) and (_col3 > 8)) (type: boolean)
                    Statistics: Num rows: 1 Data size: 88 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                        null sort order: z
                        sort order: +
                        Map-reduce partition columns: UDFToInteger(_col0) (type: int)
                        Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((_col2 = _col3) and (_col3 <= 8)) (type: boolean)
                    Statistics: Num rows: 2 Data size: 176 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                      outputColumnNames: _col0
                      Statistics: Num rows: 2 Data size: 152 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                        null sort order: z
                        sort order: +
                        Map-reduce partition columns: UDFToInteger(_col0) (type: int)
                        Statistics: Num rows: 2 Data size: 152 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: ((_col2 = _col3) and (_col3 <= 8)) (type: boolean)
                    Statistics: Num rows: 2 Data size: 176 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: _col2 (type: int), 7 (type: int)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        null sort order: a
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col1 (type: int)
                  Filter Operator
                    predicate: _col2 is null (type: boolean)
                    Statistics: Num rows: 4 Data size: 192 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: _col3 (type: int), _col1 (type: int)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
                      Reduce Output Operator
                        key expressions: _col0 (type: int)
                        null sort order: a
                        sort order: +
                        Map-reduce partition columns: _col0 (type: int)
                        Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
                        value expressions: _col1 (type: int)
                  Filter Operator
                    predicate: (_col2 = _col3) (type: boolean)
                    Statistics: Num rows: 3 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                    Select Operator
                      expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                      outputColumnNames: _col0
                      Statistics: Num rows: 3 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                      Group By Operator
                        aggregations: count()
                        keys: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                        minReductionHashAggr: 0.4
                        mode: hash
                        outputColumnNames: _col0, _col1
                        Statistics: Num rows: 2 Data size: 168 Basic stats: COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                          null sort order: z
                          sort order: +
                          Map-reduce partition columns: _col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                          Statistics: Num rows: 2 Data size: 168 Basic stats: COMPLETE Column stats: COMPLETE
                          value expressions: _col1 (type: bigint)
        Reducer 3 
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 76 Basic stats: COMPLETE Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      name: default.acidtbl_n0
                  Write Type: DELETE
        Reducer 4 
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                outputColumnNames: _col0
                Statistics: Num rows: 2 Data size: 152 Basic stats: COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 2 Data size: 152 Basic stats: COMPLETE Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      name: default.acidtbl_n0
                  Write Type: DELETE
        Reducer 5 
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      name: default.acidtbl_n0
                  Write Type: INSERT
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int)
                  outputColumnNames: a, b
                  Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
                  Group By Operator
                    aggregations: min(a), max(a), count(1), count(a), compute_bit_vector_hll(a), min(b), max(b), count(b), compute_bit_vector_hll(b)
                    minReductionHashAggr: 0.5
                    mode: hash
                    outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                    Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      null sort order: 
                      sort order: 
                      Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: int), _col1 (type: int), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: binary), _col5 (type: int), _col6 (type: int), _col7 (type: bigint), _col8 (type: binary)
        Reducer 6 
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Group By Operator
                aggregations: min(VALUE._col0), max(VALUE._col1), count(VALUE._col2), count(VALUE._col3), compute_bit_vector_hll(VALUE._col4), min(VALUE._col5), max(VALUE._col6), count(VALUE._col7), compute_bit_vector_hll(VALUE._col8)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: 'LONG' (type: string), UDFToLong(_col0) (type: bigint), UDFToLong(_col1) (type: bigint), (_col2 - _col3) (type: bigint), COALESCE(ndv_compute_bit_vector(_col4),0) (type: bigint), _col4 (type: binary), 'LONG' (type: string), UDFToLong(_col5) (type: bigint), UDFToLong(_col6) (type: bigint), (_col2 - _col7) (type: bigint), COALESCE(ndv_compute_bit_vector(_col8),0) (type: bigint), _col8 (type: binary)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11
                  Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 7 
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      name: default.acidtbl_n0
                  Write Type: INSERT
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int)
                  outputColumnNames: a, b
                  Statistics: Num rows: 4 Data size: 32 Basic stats: COMPLETE Column stats: COMPLETE
                  Group By Operator
                    aggregations: min(a), max(a), count(1), count(a), compute_bit_vector_hll(a), min(b), max(b), count(b), compute_bit_vector_hll(b)
                    minReductionHashAggr: 0.75
                    mode: hash
                    outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                    Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      null sort order: 
                      sort order: 
                      Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: int), _col1 (type: int), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: binary), _col5 (type: int), _col6 (type: int), _col7 (type: bigint), _col8 (type: binary)
        Reducer 8 
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Group By Operator
                aggregations: min(VALUE._col0), max(VALUE._col1), count(VALUE._col2), count(VALUE._col3), compute_bit_vector_hll(VALUE._col4), min(VALUE._col5), max(VALUE._col6), count(VALUE._col7), compute_bit_vector_hll(VALUE._col8)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: 'LONG' (type: string), UDFToLong(_col0) (type: bigint), UDFToLong(_col1) (type: bigint), (_col2 - _col3) (type: bigint), COALESCE(ndv_compute_bit_vector(_col4),0) (type: bigint), _col4 (type: binary), 'LONG' (type: string), UDFToLong(_col5) (type: bigint), UDFToLong(_col6) (type: bigint), (_col2 - _col7) (type: bigint), COALESCE(ndv_compute_bit_vector(_col8),0) (type: bigint), _col8 (type: binary)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11
                  Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Reducer 9 
            Execution mode: llap
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                keys: KEY._col0 (type: struct<writeid:bigint,bucketid:int,rowid:bigint>)
                mode: mergepartial
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 2 Data size: 168 Basic stats: COMPLETE Column stats: COMPLETE
                Filter Operator
                  predicate: (_col1 > 1L) (type: boolean)
                  Statistics: Num rows: 1 Data size: 84 Basic stats: COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: cardinality_violation(_col0) (type: int)
                    outputColumnNames: _col0
                    Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                          name: default.merge_tmp_table

  Stage: Stage-6
    Dependency Collection

  Stage: Stage-0
    Move Operator
      tables:
          replace: false
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: default.acidtbl_n0
          Write Type: DELETE

  Stage: Stage-7
    Stats Work
      Basic Stats Work:

  Stage: Stage-1
    Move Operator
      tables:
          replace: false
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: default.acidtbl_n0
          Write Type: DELETE

  Stage: Stage-8
    Stats Work
      Basic Stats Work:

  Stage: Stage-2
    Move Operator
      tables:
          replace: false
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: default.acidtbl_n0
          Write Type: INSERT

  Stage: Stage-9
    Stats Work
      Basic Stats Work:

  Stage: Stage-3
    Move Operator
      tables:
          replace: false
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: default.acidtbl_n0
          Write Type: INSERT

  Stage: Stage-10
    Stats Work
      Basic Stats Work:
      Column Stats Desc:
          Columns: a, b
          Column Types: int, int
          Table: default.acidtbl_n0

  Stage: Stage-4
    Move Operator
      tables:
          replace: false
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.merge_tmp_table

  Stage: Stage-11
    Stats Work
      Basic Stats Work:

One of the insert Reducers and the follow-up Reducer for col stats collecting:

        Reducer 5 
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Select Operator
                expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      name: default.acidtbl_n0
                  Write Type: INSERT
                Select Operator
                  expressions: _col0 (type: int), _col1 (type: int)
                  outputColumnNames: a, b
                  Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE
                  Group By Operator
                    aggregations: min(a), max(a), count(1), count(a), compute_bit_vector_hll(a), min(b), max(b), count(b), compute_bit_vector_hll(b)
                    minReductionHashAggr: 0.5
                    mode: hash
                    outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                    Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      null sort order: 
                      sort order: 
                      Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: int), _col1 (type: int), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: binary), _col5 (type: int), _col6 (type: int), _col7 (type: bigint), _col8 (type: binary)
        Reducer 6 
            Execution mode: vectorized, llap
            Reduce Operator Tree:
              Group By Operator
                aggregations: min(VALUE._col0), max(VALUE._col1), count(VALUE._col2), count(VALUE._col3), compute_bit_vector_hll(VALUE._col4), min(VALUE._col5), max(VALUE._col6), count(VALUE._col7), compute_bit_vector_hll(VALUE._col8)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                Statistics: Num rows: 1 Data size: 328 Basic stats: COMPLETE Column stats: COMPLETE
                Select Operator
                  expressions: 'LONG' (type: string), UDFToLong(_col0) (type: bigint), UDFToLong(_col1) (type: bigint), (_col2 - _col3) (type: bigint), COALESCE(ndv_compute_bit_vector(_col4),0) (type: bigint), _col4 (type: binary), 'LONG' (type: string), UDFToLong(_col5) (type: bigint), UDFToLong(_col6) (type: bigint), (_col2 - _col7) (type: bigint), COALESCE(ndv_compute_bit_vector(_col8),0) (type: bigint), _col8 (type: binary)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11
                  Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 528 Basic stats: COMPLETE Column stats: COMPLETE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

Attachments

Issue Links

links to

GitHub Pull Request #3411

Activity

People

Assignee:: Krisztian Kasa

Reporter:: Krisztian Kasa

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Jun/22 11:17

Updated:: 16/Nov/22 13:50

Resolved:: 30/Jun/22 12:24

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m