Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16793

Scalar sub-query: sq_count_check not required if gby keys are constant

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      This query has an sq_count check, though is useless on a constant key.

      hive> explain select * from part where p_size > (select max(p_size) from part where p_type = '1' group by p_type);
      Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
      Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
      OK
      Plan optimized by CBO.
      
      Vertex dependency in root stage
      Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
      Reducer 3 <- Map 2 (SIMPLE_EDGE)
      Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
      Reducer 6 <- Map 5 (SIMPLE_EDGE)
      
      Stage-0
        Fetch Operator
          limit:-1
          Stage-1
            Map 1 vectorized, llap
            File Output Operator [FS_64]
              Select Operator [SEL_63] (rows=66666666 width=621)
                Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
                Filter Operator [FIL_62] (rows=66666666 width=625)
                  predicate:(_col5 > _col10)
                  Map Join Operator [MAPJOIN_61] (rows=200000000 width=625)
                    Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
                  <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
                    BROADCAST [RS_58]
                      Select Operator [SEL_57] (rows=1 width=4)
                        Output:["_col0"]
                        Group By Operator [GBY_56] (rows=1 width=89)
                          Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
                        <-Map 5 [SIMPLE_EDGE] vectorized, llap
                          SHUFFLE [RS_55]
                            PartitionCols:_col0
                            Group By Operator [GBY_54] (rows=86 width=89)
                              Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
                              Select Operator [SEL_53] (rows=1212121 width=109)
                                Output:["_col1"]
                                Filter Operator [FIL_52] (rows=1212121 width=109)
                                  predicate:(p_type = '1')
                                  TableScan [TS_17] (rows=200000000 width=109)
                                    tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
                  <-Map Join Operator [MAPJOIN_60] (rows=200000000 width=621)
                      Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
                    <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
                      BROADCAST [RS_51]
                        Select Operator [SEL_50] (rows=1 width=8)
                          Filter Operator [FIL_49] (rows=1 width=8)
                            predicate:(sq_count_check(_col0) <= 1)
                            Group By Operator [GBY_48] (rows=1 width=8)
                              Output:["_col0"],aggregations:["count(VALUE._col0)"]
                            <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
                              PARTITION_ONLY_SHUFFLE [RS_47]
                                Group By Operator [GBY_46] (rows=1 width=8)
                                  Output:["_col0"],aggregations:["count()"]
                                  Select Operator [SEL_45] (rows=1 width=85)
                                    Group By Operator [GBY_44] (rows=1 width=85)
                                      Output:["_col0"],keys:KEY._col0
                                    <-Map 2 [SIMPLE_EDGE] vectorized, llap
                                      SHUFFLE [RS_43]
                                        PartitionCols:_col0
                                        Group By Operator [GBY_42] (rows=83 width=85)
                                          Output:["_col0"],keys:'1'
                                          Select Operator [SEL_41] (rows=1212121 width=105)
                                            Filter Operator [FIL_40] (rows=1212121 width=105)
                                              predicate:(p_type = '1')
                                              TableScan [TS_2] (rows=200000000 width=105)
                                                tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
                    <-Select Operator [SEL_59] (rows=200000000 width=621)
                        Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
                        TableScan [TS_0] (rows=200000000 width=621)
                          tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
      

      The other version without the filter is missing the check, though the compiler cannot assume the nDV of p_type. Fixed by HIVE-16851

      Attachments

        1. HIVE-16793.1.patch
          60 kB
          Vineet Garg
        2. HIVE-16793.2.patch
          294 kB
          Vineet Garg
        3. HIVE-16793.3.patch
          273 kB
          Vineet Garg
        4. HIVE-16793.4.patch
          475 kB
          Vineet Garg
        5. HIVE-16793.5.patch
          59 kB
          Vineet Garg
        6. HIVE-16793.6.patch
          58 kB
          Vineet Garg

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vgarg Vineet Garg Assign to me
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment