Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13693

Multi-insert query drops Filter before file output when there is a.val <> b.val

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3.0, 2.0.0, 2.1.0
    • 2.1.0
    • Logical Optimizer
    • None

    Description

      To reproduce:

      CREATE TABLE T_A ( id STRING, val STRING ); 
      CREATE TABLE T_B ( id STRING, val STRING ); 
      CREATE TABLE join_result_1 ( ida STRING, vala STRING, idb STRING, valb STRING ); 
      CREATE TABLE join_result_3 ( ida STRING, vala STRING, idb STRING, valb STRING ); 
      
      INSERT INTO TABLE T_A 
      VALUES ('Id_1', 'val_101'), ('Id_2', 'val_102'), ('Id_3', 'val_103'); 
      
      INSERT INTO TABLE T_B 
      VALUES ('Id_1', 'val_103'), ('Id_2', 'val_104'); 
      
      explain
      FROM T_A a LEFT JOIN T_B b ON a.id = b.id
      INSERT OVERWRITE TABLE join_result_1
      SELECT a.*, b.*
      WHERE b.id = 'Id_1' AND b.val = 'val_103'
      INSERT OVERWRITE TABLE join_result_3
      SELECT a.*, b.*
      WHERE b.val = 'val_104' AND b.id = 'Id_2' AND a.val <> b.val;
      

      The (wrong) plan is the following:

      STAGE DEPENDENCIES:
        Stage-2 is a root stage
        Stage-3 depends on stages: Stage-2
        Stage-0 depends on stages: Stage-3
        Stage-4 depends on stages: Stage-0
        Stage-1 depends on stages: Stage-3
        Stage-5 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-2
          Tez
            DagId: haha_20160504140944_174465c9-5d1a-42f9-9665-fae02eeb2767:2
            Edges:
              Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)
            DagName: 
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: a
                        Statistics: Num rows: 3 Data size: 36 Basic stats: COMPLETE Column stats: NONE
                        Reduce Output Operator
                          key expressions: id (type: string)
                          sort order: +
                          Map-reduce partition columns: id (type: string)
                          Statistics: Num rows: 3 Data size: 36 Basic stats: COMPLETE Column stats: NONE
                          value expressions: val (type: string)
              Map 3 
                  Map Operator Tree:
                      TableScan
                        alias: b
                        Statistics: Num rows: 2 Data size: 24 Basic stats: COMPLETE Column stats: NONE
                        Reduce Output Operator
                          key expressions: id (type: string)
                          sort order: +
                          Map-reduce partition columns: id (type: string)
                          Statistics: Num rows: 2 Data size: 24 Basic stats: COMPLETE Column stats: NONE
                          value expressions: val (type: string)
              Reducer 2 
                  Reduce Operator Tree:
                    Merge Join Operator
                      condition map:
                           Left Outer Join0 to 1
                      keys:
                        0 id (type: string)
                        1 id (type: string)
                      outputColumnNames: _col0, _col1, _col6
                      Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: string), _col1 (type: string), 'Id_1' (type: string), 'val_103' (type: string)
                        outputColumnNames: _col0, _col1, _col2, _col3
                        Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE Column stats: NONE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                              name: bugtest2.join_result_1
                      Filter Operator
                        predicate: (_col1 <> _col6) (type: boolean)
                        Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE Column stats: NONE
                        Select Operator
                          expressions: _col0 (type: string), _col1 (type: string), 'Id_2' (type: string), 'val_104' (type: string)
                          outputColumnNames: _col0, _col1, _col2, _col3
                          Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE Column stats: NONE
                          File Output Operator
                            compressed: false
                            Statistics: Num rows: 3 Data size: 39 Basic stats: COMPLETE Column stats: NONE
                            table:
                                input format: org.apache.hadoop.mapred.TextInputFormat
                                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                                name: bugtest2.join_result_3
      
        Stage: Stage-3
          Dependency Collection
      
        Stage: Stage-0
          Move Operator
            tables:
                replace: true
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: bugtest2.join_result_1
      
        Stage: Stage-4
          Stats-Aggr Operator
      
        Stage: Stage-1
          Move Operator
            tables:
                replace: true
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: bugtest2.join_result_3
      
        Stage: Stage-5
          Stats-Aggr Operator
      

      Attachments

        1. HIVE-13693.02.patch
          88 kB
          jcamachorodriguez
        2. HIVE-13693.01.patch
          50 kB
          jcamachorodriguez
        3. HIVE-13693.01.patch
          50 kB
          jcamachorodriguez
        4. HIVE-13693.patch
          30 kB
          jcamachorodriguez

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jcamacho Jesús Camacho Rodríguez Assign to me
            jcamacho Jesús Camacho Rodríguez
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment