Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16793

Scalar sub-query: sq_count_check not required if gby keys are constant

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      This query has an sq_count check, though is useless on a constant key.

      hive> explain select * from part where p_size > (select max(p_size) from part where p_type = '1' group by p_type);
      Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
      Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
      OK
      Plan optimized by CBO.
      
      Vertex dependency in root stage
      Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
      Reducer 3 <- Map 2 (SIMPLE_EDGE)
      Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
      Reducer 6 <- Map 5 (SIMPLE_EDGE)
      
      Stage-0
        Fetch Operator
          limit:-1
          Stage-1
            Map 1 vectorized, llap
            File Output Operator [FS_64]
              Select Operator [SEL_63] (rows=66666666 width=621)
                Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
                Filter Operator [FIL_62] (rows=66666666 width=625)
                  predicate:(_col5 > _col10)
                  Map Join Operator [MAPJOIN_61] (rows=200000000 width=625)
                    Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
                  <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
                    BROADCAST [RS_58]
                      Select Operator [SEL_57] (rows=1 width=4)
                        Output:["_col0"]
                        Group By Operator [GBY_56] (rows=1 width=89)
                          Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
                        <-Map 5 [SIMPLE_EDGE] vectorized, llap
                          SHUFFLE [RS_55]
                            PartitionCols:_col0
                            Group By Operator [GBY_54] (rows=86 width=89)
                              Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
                              Select Operator [SEL_53] (rows=1212121 width=109)
                                Output:["_col1"]
                                Filter Operator [FIL_52] (rows=1212121 width=109)
                                  predicate:(p_type = '1')
                                  TableScan [TS_17] (rows=200000000 width=109)
                                    tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
                  <-Map Join Operator [MAPJOIN_60] (rows=200000000 width=621)
                      Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
                    <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
                      BROADCAST [RS_51]
                        Select Operator [SEL_50] (rows=1 width=8)
                          Filter Operator [FIL_49] (rows=1 width=8)
                            predicate:(sq_count_check(_col0) <= 1)
                            Group By Operator [GBY_48] (rows=1 width=8)
                              Output:["_col0"],aggregations:["count(VALUE._col0)"]
                            <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
                              PARTITION_ONLY_SHUFFLE [RS_47]
                                Group By Operator [GBY_46] (rows=1 width=8)
                                  Output:["_col0"],aggregations:["count()"]
                                  Select Operator [SEL_45] (rows=1 width=85)
                                    Group By Operator [GBY_44] (rows=1 width=85)
                                      Output:["_col0"],keys:KEY._col0
                                    <-Map 2 [SIMPLE_EDGE] vectorized, llap
                                      SHUFFLE [RS_43]
                                        PartitionCols:_col0
                                        Group By Operator [GBY_42] (rows=83 width=85)
                                          Output:["_col0"],keys:'1'
                                          Select Operator [SEL_41] (rows=1212121 width=105)
                                            Filter Operator [FIL_40] (rows=1212121 width=105)
                                              predicate:(p_type = '1')
                                              TableScan [TS_2] (rows=200000000 width=105)
                                                tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
                    <-Select Operator [SEL_59] (rows=200000000 width=621)
                        Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
                        TableScan [TS_0] (rows=200000000 width=621)
                          tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
      

      The other version without the filter is missing the check, though the compiler cannot assume the nDV of p_type. Fixed by HIVE-16851

      Attachments

        1. HIVE-16793.1.patch
          60 kB
          Vineet Garg
        2. HIVE-16793.2.patch
          294 kB
          Vineet Garg
        3. HIVE-16793.3.patch
          273 kB
          Vineet Garg
        4. HIVE-16793.4.patch
          475 kB
          Vineet Garg
        5. HIVE-16793.5.patch
          59 kB
          Vineet Garg
        6. HIVE-16793.6.patch
          58 kB
          Vineet Garg

        Issue Links

          Activity

            People

              vgarg Vineet Garg
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: