Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16923 Hive-on-Spark DPP Improvements
  3. HIVE-17414

HoS DPP + Vectorization generates invalid explain plan due to CombineEquivalentWorkResolver

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • Spark
    • None

    Description

      Similar to HIVE-16948, the following query generates an invalid explain plan when HoS DPP is enabled + vectorization:

      select ds from (select distinct(ds) as ds from srcpart union all select distinct(ds) as ds from srcpart) s where s.ds in (select max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart)
      

      Explain Plan:

      STAGE DEPENDENCIES:
        Stage-2 is a root stage
        Stage-1 depends on stages: Stage-2
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-2
          Spark
            Edges:
              Reducer 11 <- Map 10 (GROUP, 1)
              Reducer 13 <- Map 12 (GROUP, 1)
      #### A masked pattern was here ####
            Vertices:
              Map 10
                  Map Operator Tree:
                      TableScan
                        alias: srcpart
                        Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
                        Select Operator
                          expressions: ds (type: string)
                          outputColumnNames: ds
                          Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
                          Group By Operator
                            aggregations: max(ds)
                            mode: hash
                            outputColumnNames: _col0
                            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              sort order:
                              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                              value expressions: _col0 (type: string)
                  Execution mode: vectorized
              Map 12
                  Map Operator Tree:
                      TableScan
                        alias: srcpart
                        Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
                        Select Operator
                          expressions: ds (type: string)
                          outputColumnNames: ds
                          Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
                          Group By Operator
                            aggregations: min(ds)
                            mode: hash
                            outputColumnNames: _col0
                            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              sort order:
                              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                              value expressions: _col0 (type: string)
                  Execution mode: vectorized
              Reducer 11
                  Execution mode: vectorized
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: max(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                      Filter Operator
                        predicate: _col0 is not null (type: boolean)
                        Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                          Select Operator
                            expressions: _col0 (type: string)
                            outputColumnNames: _col0
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                            Group By Operator
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                              Spark Partition Pruning Sink Operator
                                Target column: ds (string)
                                partition key expr: ds
                                Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                                target work: Map 1
                          Select Operator
                            expressions: _col0 (type: string)
                            outputColumnNames: _col0
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                            Group By Operator
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                              Spark Partition Pruning Sink Operator
                                Target column: ds (string)
                                partition key expr: ds
                                Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                                target work: Map 4
              Reducer 13
                  Execution mode: vectorized
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: min(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                      Filter Operator
                        predicate: _col0 is not null (type: boolean)
                        Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                          Select Operator
                            expressions: _col0 (type: string)
                            outputColumnNames: _col0
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                            Group By Operator
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                              Spark Partition Pruning Sink Operator
                                Target column: ds (string)
                                partition key expr: ds
                                Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                                target work: Map 1
                          Select Operator
                            expressions: _col0 (type: string)
                            outputColumnNames: _col0
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                            Group By Operator
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                              Spark Partition Pruning Sink Operator
                                Target column: ds (string)
                                partition key expr: ds
                                Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                                target work: Map 4
      
        Stage: Stage-1
          Spark
            Edges:
              Reducer 2 <- Map 1 (GROUP, 4)
              Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 4), Reducer 2 (PARTITION-LEVEL SORT, 4), Reducer 7 (PARTITION-LEVEL SORT, 4), Reducer 9 (PARTITION-LEVEL SORT, 4)
              Reducer 7 <- Map 6 (GROUP, 1)
              Reducer 9 <- Map 8 (GROUP, 1)
      #### A masked pattern was here ####
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: srcpart
                        filterExpr: ds is not null (type: boolean)
                        Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          keys: ds (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: string)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: string)
                            Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
                  Execution mode: vectorized
              Map 6
                  Map Operator Tree:
                      TableScan
                        alias: srcpart
                        Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
                        Select Operator
                          expressions: ds (type: string)
                          outputColumnNames: ds
                          Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
                          Group By Operator
                            aggregations: max(ds)
                            mode: hash
                            outputColumnNames: _col0
                            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              sort order:
                              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                              value expressions: _col0 (type: string)
                  Execution mode: vectorized
              Map 8
                  Map Operator Tree:
                      TableScan
                        alias: srcpart
                        Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
                        Select Operator
                          expressions: ds (type: string)
                          outputColumnNames: ds
                          Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
                          Group By Operator
                            aggregations: min(ds)
                            mode: hash
                            outputColumnNames: _col0
                            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              sort order:
                              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                              value expressions: _col0 (type: string)
                  Execution mode: vectorized
              Reducer 2
                  Execution mode: vectorized
                  Reduce Operator Tree:
                    Group By Operator
                      keys: KEY._col0 (type: string)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1000 Data size: 10624 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 2000 Data size: 21248 Basic stats: COMPLETE Column stats: NONE
              Reducer 3
                  Reduce Operator Tree:
                    Join Operator
                      condition map:
                           Left Semi Join 0 to 1
                      keys:
                        0 _col0 (type: string)
                        1 _col0 (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 2200 Data size: 23372 Basic stats: COMPLETE Column stats: NONE
                      File Output Operator
                        compressed: false
                        Statistics: Num rows: 2200 Data size: 23372 Basic stats: COMPLETE Column stats: NONE
                        table:
                            input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              Reducer 7
                  Execution mode: vectorized
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: max(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                      Filter Operator
                        predicate: _col0 is not null (type: boolean)
                        Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: string)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: string)
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
              Reducer 9
                  Execution mode: vectorized
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: min(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                      Filter Operator
                        predicate: _col0 is not null (type: boolean)
                        Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: string)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: string)
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

      Attachments

        1. HIVE-17414.patch
          0.7 kB
          liyunzhang
        2. HIVE-17414.1.patch
          0.9 kB
          liyunzhang
        3. HIVE-17414.2.patch
          1 kB
          liyunzhang
        4. HIVE-17414.3.patch
          6 kB
          liyunzhang
        5. HIVE-17414.4.patch
          14 kB
          liyunzhang
        6. HIVE-17414.5.patch
          4 kB
          liyunzhang

        Issue Links

          Activity

            People

              kellyzly liyunzhang
              stakiar Sahil Takiar
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: