Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8747

Estimate number of rows for table with 0 rows overflows resulting in an in-efficient plan

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.14.0
    • Fix Version/s: 0.14.0
    • Component/s: Physical Optimizer
    • Labels:
      None

      Description

      ship_mode table has 0 rows.

      Query

      select count(*) 
      from
                web_sales
               ,date_dim
       	  ,ship_mode
           where
       web_sales.ws_sold_date_sk = date_dim.d_date_sk
       	and web_sales.ws_ship_mode_sk = ship_mode.sm_ship_mode_sk
              and d_year = 2002
       	and sm_carrier in ('DIAMOND','AIRBORNE')
      

      Explain

      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Map 1 <- Map 4 (BROADCAST_EDGE)
              Map 4 <- Map 3 (BROADCAST_EDGE)
              Reducer 2 <- Map 1 (SIMPLE_EDGE)
            DagName: mmokhtar_20141105180404_59e6fb65-529f-4eaa-9446-7f34d12bffac:30
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: ship_mode
                        filterExpr: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and sm_ship_mode_sk is not null) (type: boolean)
                        Statistics: Num rows: 0 Data size: 45 Basic stats: PARTIAL Column stats: COMPLETE
                        Filter Operator
                          predicate: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and sm_ship_mode_sk is not null) (type: boolean)
                          Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: sm_ship_mode_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0
                                1
                              keys:
                                0 _col1 (type: int)
                                1 _col0 (type: int)
                              input vertices:
                                0 Map 4
                              Statistics: Num rows: 9223372036854775807 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                              Select Operator
                                Statistics: Num rows: 9223372036854775807 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                Group By Operator
                                  aggregations: count()
                                  mode: hash
                                  outputColumnNames: _col0
                                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                  Reduce Output Operator
                                    sort order:
                                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                    value expressions: _col0 (type: bigint)
                  Execution mode: vectorized
              Map 3
                  Map Operator Tree:
                      TableScan
                        alias: date_dim
                        filterExpr: ((d_year = 2002) and d_date_sk is not null) (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ((d_year = 2002) and d_date_sk is not null) (type: boolean)
                          Statistics: Num rows: 652 Data size: 5216 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 4
                  Map Operator Tree:
                      TableScan
                        alias: web_sales
                        filterExpr: (ws_sold_date_sk is not null and ws_ship_mode_sk is not null) (type: boolean)
                        Statistics: Num rows: 143966864 Data size: 19577477788 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (ws_sold_date_sk is not null and ws_ship_mode_sk is not null) (type: boolean)
                          Statistics: Num rows: 143948856 Data size: 1151518824 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: ws_sold_date_sk (type: int), ws_ship_mode_sk (type: int)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 143948856 Data size: 1151518824 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col1}
                                1
                              keys:
                                0 _col0 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col1
                              input vertices:
                                1 Map 3
                              Statistics: Num rows: 1284818 Data size: 5139272 Basic stats: COMPLETE Column stats: COMPLETE
                              Reduce Output Operator
                                key expressions: _col1 (type: int)
                                sort order: +
                                Map-reduce partition columns: _col1 (type: int)
                                Statistics: Num rows: 1284818 Data size: 5139272 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Reducer 2
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: count(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: bigint)
                        outputColumnNames: _col0
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  Execution mode: vectorized
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

        Attachments

          Activity

            People

            • Assignee:
              prasanth_j Prasanth Jayachandran
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: