Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19653

Incorrect predicate pushdown for groupby with grouping sets

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Consider the following query:

      CREATE TABLE T1(a STRING, b STRING, s BIGINT);
      INSERT OVERWRITE TABLE T1 VALUES ('aaaa', 'bbbb', 123456);
      
      SELECT * FROM (
      SELECT a, b, sum(s)
      FROM T1
      GROUP BY a, b GROUPING SETS ((), (a), (b), (a, b))
      ) t WHERE a IS NOT NULL;
      

      When hive.optimize.ppd is enabled (and hive.cbo.enable=false), the query will output:

      NULL	NULL	123456
      NULL	bbbb	123456
      aaaa	NULL	123456
      aaaa	bbbb	123456
      

      We can see the predicate "a IS NOT NULL" takes no effect, which is incorrect.

      When performing PPD optimization for a GBY operator, we should make sure all grouping sets contains the processing expr before pushdown. otherwise the expr value after GBY is changed and the result is wrong.

      Attachments

        1. HIVE-19653.patch
          5 kB
          Zhang Li
        2. HIVE-19653.1.patch
          6 kB
          Zhang Li

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dengzh Zhihua Deng Assign to me
            richox Zhang Li
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 2h
              2h

              Slack

                Issue deployment