Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22811

Stat based min/max aggregates are not serviced from colstats in nested cases

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Statistics
    • None

    Description

      The statsOptimizer is able to use min/max/etc values to service simple queries

      (select max(id) from t t0)
      

      however the same doesn't happen for queries like:

      explain select * from u where u.id>(select max(id) from t t0);
      

      explain:

      | Plan optimized by CBO.                             |
      |                                                    |
      | Vertex dependency in root stage                    |
      | Reducer 3 <- Map 1 (BROADCAST_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE) |
      |                                                    |
      | Stage-0                                            |
      |   Fetch Operator                                   |
      |     limit:-1                                       |
      |     Stage-1                                        |
      |       Reducer 3 vectorized                         |
      |       File Output Operator [FS_31]                 |
      |         Select Operator [SEL_30] (rows=1 width=8)  |
      |           Output:["_col0","_col1"]                 |
      |           Filter Operator [FIL_29] (rows=1 width=12) |
      |             predicate:(_col0 > _col2)              |
      |             Map Join Operator [MAPJOIN_28] (rows=3 width=12) |
      |               Conds:(Inner),Output:["_col0","_col1","_col2"] |
      |             <-Map 1 [BROADCAST_EDGE] vectorized    |
      |               BROADCAST [RS_25]                    |
      |                 Select Operator [SEL_24] (rows=3 width=8) |
      |                   Output:["_col0","_col1"]         |
      |                   Filter Operator [FIL_23] (rows=3 width=8) |
      |                     predicate:id is not null       |
      |                     TableScan [TS_0] (rows=3 width=8) |
      |                       default@u,u,Tbl:COMPLETE,Col:COMPLETE,Output:["id","cnt"] |
      |             <-Filter Operator [FIL_27] (rows=1 width=4) |
      |                 predicate:_col0 is not null        |
      |                 Group By Operator [GBY_26] (rows=1 width=4) |
      |                   Output:["_col0"],aggregations:["max(VALUE._col0)"] |
      |                 <-Map 2 [CUSTOM_SIMPLE_EDGE] vectorized |
      |                   PARTITION_ONLY_SHUFFLE [RS_22]   |
      |                     Group By Operator [GBY_21] (rows=1 width=4) |
      |                       Output:["_col0"],aggregations:["max(id)"] |
      |                       Select Operator [SEL_20] (rows=4 width=4) |
      |                         Output:["id"]              |
      |                         TableScan [TS_3] (rows=4 width=4) |
      |                           default@t,t0,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] |
      
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            kgyrtkirk Zoltan Haindrich
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: