Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25170

Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

       

      
      SET hive.remove.orderby.in.subquery=false;
      
      EXPLAIN
      SELECT constant_col, key, max(value)
      FROM
      (
        SELECT 'constant' as constant_col, key, value
        FROM src
        DISTRIBUTE BY constant_col, key
        SORT BY constant_col, key, value
      ) a
      GROUP BY constant_col, key
      LIMIT 10;
      
      OK
      Vertex dependency in root stage
      Reducer 2 <- Map 1 (SIMPLE_EDGE)
      Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
        Fetch Operator
          limit:10
          Stage-1
            Reducer 3
            File Output Operator [FS_10]
              Limit [LIM_9] (rows=1 width=368)
                Number of rows:10
                Select Operator [SEL_8] (rows=1 width=368)
                  Output:["_col0","_col1","_col2"]
                  Group By Operator [GBY_7] (rows=1 width=368)
                    Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', 'constant'
                  <-Reducer 2 [SIMPLE_EDGE]
                    SHUFFLE [RS_6]
                      PartitionCols:'constant', 'constant'
                      Group By Operator [GBY_5] (rows=1 width=368)
                        Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 'constant'
                        Select Operator [SEL_3] (rows=500 width=178)
                          Output:["_col2"]
                        <-Map 1 [SIMPLE_EDGE]
                          SHUFFLE [RS_2]
                            PartitionCols:'constant', _col1
                            Select Operator [SEL_1] (rows=500 width=178)
                              Output:["_col1","_col2"]
                              TableScan [TS_0] (rows=500 width=10)
                                src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]

      Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 'constant', it should be 'constant', _col1

       

      That's because after HIVE-13808,  SemanticAnalyzer uses sortCols to generate the colExprMap structure in the key part, while the key columns are generated by newSortCols, leading to a column and expr mismatch when the constant column is not the trailing column in the key columns.

      Constant propagation optimizer uses this colExprMap and finds extra const expression in the mismatched map, resulting in this error.

       

      In fact, colExprMap is used by multiple optimizers, which makes this quite a serious problem.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zhangweilst Wei Zhang Assign to me
            zhangweilst Wei Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 0.5h
              0.5h

              Slack

                Issue deployment