Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16948

Invalid explain when running dynamic partition pruning query in Hive On Spark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0
    • None
    • None

    Description

      in union_subquery.q in spark_dynamic_partition_pruning.q

      set hive.optimize.ppd=true;
      set hive.ppd.remove.duplicatefilters=true;
      set hive.spark.dynamic.partition.pruning=true;
      set hive.optimize.metadataonly=false;
      set hive.optimize.index.filter=true;
      set hive.strict.checks.cartesian.product=false;
      explain select ds from (select distinct(ds) as ds from srcpart union all select distinct(ds) as ds from srcpart) s where s.ds in (select max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
      

      explain

      STAGE DEPENDENCIES:
        Stage-2 is a root stage
        Stage-1 depends on stages: Stage-2
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-2
          Spark
            Edges:
              Reducer 11 <- Map 10 (GROUP, 1)
              Reducer 13 <- Map 12 (GROUP, 1)
            DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
            Vertices:
              Map 10 
                  Map Operator Tree:
                      TableScan
                        alias: srcpart
                        Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
                        Select Operator
                          expressions: ds (type: string)
                          outputColumnNames: ds
                          Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
                          Group By Operator
                            aggregations: max(ds)
                            mode: hash
                            outputColumnNames: _col0
                            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              sort order: 
                              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                              value expressions: _col0 (type: string)
              Map 12 
                  Map Operator Tree:
                      TableScan
                        alias: srcpart
                        Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
                        Select Operator
                          expressions: ds (type: string)
                          outputColumnNames: ds
                          Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
                          Group By Operator
                            aggregations: min(ds)
                            mode: hash
                            outputColumnNames: _col0
                            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              sort order: 
                              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                              value expressions: _col0 (type: string)
              Reducer 11 
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: max(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                      Filter Operator
                        predicate: _col0 is not null (type: boolean)
                        Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                          Select Operator
                            expressions: _col0 (type: string)
                            outputColumnNames: _col0
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                            Group By Operator
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                              Spark Partition Pruning Sink Operator
                                partition key expr: ds
                                Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                                target column name: ds
                                target work: Map 1
                          Select Operator
                            expressions: _col0 (type: string)
                            outputColumnNames: _col0
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                            Group By Operator
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                              Spark Partition Pruning Sink Operator
                                partition key expr: ds
                                Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                                target column name: ds
                                target work: Map 4
              Reducer 13 
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: min(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                      Filter Operator
                        predicate: _col0 is not null (type: boolean)
                        Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                          Select Operator
                            expressions: _col0 (type: string)
                            outputColumnNames: _col0
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                            Group By Operator
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                              Spark Partition Pruning Sink Operator
                                partition key expr: ds
                                Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                                target column name: ds
                                target work: Map 1
                          Select Operator
                            expressions: _col0 (type: string)
                            outputColumnNames: _col0
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                            Group By Operator
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                              Spark Partition Pruning Sink Operator
                                partition key expr: ds
                                Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                                target column name: ds
                                target work: Map 4
      
        Stage: Stage-1
          Spark
            Edges:
              Reducer 2 <- Map 1 (GROUP, 2)
              Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 2), Reducer 2 (PARTITION-LEVEL SORT, 2), Reducer 7 (PARTITION-LEVEL SORT, 2), Reducer 9 (PARTITION-LEVEL SORT, 2)
              Reducer 7 <- Map 6 (GROUP, 1)
              Reducer 9 <- Map 8 (GROUP, 1)
            DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:1
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: srcpart
                        filterExpr: ds is not null (type: boolean)
                        Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
                        Group By Operator
                          keys: ds (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 1 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: string)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: string)
                            Statistics: Num rows: 1 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
              Map 6 
                  Map Operator Tree:
                      TableScan
                        alias: srcpart
                        Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
                        Select Operator
                          expressions: ds (type: string)
                          outputColumnNames: ds
                          Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
                          Group By Operator
                            aggregations: max(ds)
                            mode: hash
                            outputColumnNames: _col0
                            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              sort order: 
                              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                              value expressions: _col0 (type: string)
              Map 8 
                  Map Operator Tree:
                      TableScan
                        alias: srcpart
                        Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
                        Select Operator
                          expressions: ds (type: string)
                          outputColumnNames: ds
                          Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
                          Group By Operator
                            aggregations: min(ds)
                            mode: hash
                            outputColumnNames: _col0
                            Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              sort order: 
                              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                              value expressions: _col0 (type: string)
              Reducer 2 
                  Reduce Operator Tree:
                    Group By Operator
                      keys: KEY._col0 (type: string)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 2 Data size: 46496 Basic stats: COMPLETE Column stats: NONE
              Reducer 3 
                  Reduce Operator Tree:
                    Join Operator
                      condition map:
                           Left Semi Join 0 to 1
                      keys:
                        0 _col0 (type: string)
                        1 _col0 (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 2 Data size: 51145 Basic stats: COMPLETE Column stats: NONE
                      File Output Operator
                        compressed: false
                        Statistics: Num rows: 2 Data size: 51145 Basic stats: COMPLETE Column stats: NONE
                        table:
                            input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              Reducer 7 
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: max(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                      Filter Operator
                        predicate: _col0 is not null (type: boolean)
                        Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: string)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: string)
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
              Reducer 9 
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: min(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                      Filter Operator
                        predicate: _col0 is not null (type: boolean)
                        Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: string)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: string)
                            Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

      the target work of Reducer11 and Reducer13 is Map4 , but Map4 does not exist in the explain

      Attachments

        1. 17193_compare_RS_in_Map_5_1.PNG
          299 kB
          liyunzhang
        2. HIVE-16948_1.patch
          6 kB
          liyunzhang
        3. HIVE-16948.2.patch
          7 kB
          liyunzhang
        4. HIVE-16948.5.patch
          13 kB
          liyunzhang
        5. HIVE-16948.6.patch
          17 kB
          liyunzhang
        6. HIVE-16948.7.patch
          14 kB
          liyunzhang
        7. HIVE-16948.patch
          6 kB
          liyunzhang

        Issue Links

          Activity

            People

              kellyzly liyunzhang
              kellyzly liyunzhang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: