Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11160 Auto-gather column stats
  3. HIVE-18454

Incorrect rownum estimation in joins

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Statistics
    • None

    Description

      row counts seems to be off the charts...sometimes ~120000 rows estimated when the table has only 10 rows

      create table s (x int);
      
      insert into s values
      (1),(2),(3),(4),(5),
      (6),(7),(8),(9),(10);
      
      create table tu(id_uv int,id_uw int,u int);
      create table tv(id_uv int,v int);
      create table tw(id_uw int,w int);
      
      from s
      insert overwrite table tu
              select x,x,x 
              where x<=6 or x=10
      insert overwrite table tv
              select x,x              
              where x<=3 or x=10
      insert overwrite table tw
              select x,x              
      ;
      
      set hive.explain.user=true;
      
      explain analyze
      select sum(u*v*w) from tu
              join tv on (tu.id_uv=tv.id_uv)
              join tw on (tu.id_uw=tw.id_uw)
              where w>9 and u>1 and v>3;
      
      desc formatted tv;
      

      explain analyze output:

      | Plan optimized by CBO.                             |
      |                                                    |
      | Vertex dependency in root stage                    |
      | Map 1 <- Map 3 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE) |
      | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)            |
      |                                                    |
      | Stage-0                                            |
      |   Fetch Operator                                   |
      |     limit:-1                                       |
      |     Stage-1                                        |
      |       Reducer 2                                    |
      |       File Output Operator [FS_21]                 |
      |         Group By Operator [GBY_19] (rows=1/1 width=8) |
      |           Output:["_col0"],aggregations:["sum(VALUE._col0)"] |
      |         <-Map 1 [CUSTOM_SIMPLE_EDGE]               |
      |           PARTITION_ONLY_SHUFFLE [RS_18]           |
      |             Group By Operator [GBY_17] (rows=1/1 width=8) |
      |               Output:["_col0"],aggregations:["sum(_col0)"] |
      |               Select Operator [SEL_15] (rows=48400/1 width=5) |
      |                 Output:["_col0"]                   |
      |                 Map Join Operator [MAPJOIN_31] (rows=48400/1 width=5) |
      |                   Conds:MAPJOIN_30._col1=RS_13._col0(Inner),HybridGraceHashJoin:true,Output:["_col2","_col4","_col6"] |
      |                 <-Map 4 [BROADCAST_EDGE]           |
      |                   BROADCAST [RS_13]                |
      |                     PartitionCols:_col0            |
      |                     Select Operator [SEL_8] (rows=3/1 width=3) |
      |                       Output:["_col0","_col1"]     |
      |                       Filter Operator [FIL_29] (rows=3/1 width=3) |
      |                         predicate:((w > 9) and id_uw is not null) |
      |                         TableScan [TS_6] (rows=10/10 width=3) |
      |                           default@tw,tw,Tbl:COMPLETE,Col:NONE,Output:["id_uw","w"] |
      |                 <-Map Join Operator [MAPJOIN_30] (rows=44000/1 width=5) |
      |                     Conds:SEL_2._col0=RS_10._col0(Inner),HybridGraceHashJoin:true,Output:["_col1","_col2","_col4"] |
      |                   <-Map 3 [BROADCAST_EDGE]         |
      |                     BROADCAST [RS_10]              |
      |                       PartitionCols:_col0          |
      |                       Select Operator [SEL_5] (rows=1632/1 width=3) |
      |                         Output:["_col0","_col1"]   |
      |                         Filter Operator [FIL_28] (rows=1632/1 width=3) |
      |                           predicate:((v > 3) and id_uv is not null) |
      |                           TableScan [TS_3] (rows=4898/4 width=3) |
      |                             default@tv,tv,Tbl:COMPLETE,Col:NONE,Output:["id_uv","v"] |
      |                   <-Select Operator [SEL_2] (rows=40000/6 width=5) |
      |                       Output:["_col0","_col1","_col2"] |
      |                       Filter Operator [FIL_27] (rows=40000/6 width=5) |
      |                         predicate:((u > 1) and id_uv is not null and id_uw is not null) |
      |                         TableScan [TS_0] (rows=120000/7 width=5) |
      |                           default@tu,tu,Tbl:COMPLETE,Col:NONE,Output:["id_uv","id_uw","u"] |
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            kgyrtkirk Zoltan Haindrich
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: