Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7985

With CBO enabled cross product is generated when a subquery is present

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.14.0
    • 0.14.0
    • CBO
    • None

    Description

      This is a regression introduced in the latest build of the CBO branch.
      Removing the subquery for item will remove the cross products

      Query

      select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, item where item.i_item_id in (select i.i_item_id from item i where i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by i_item_id;
      
      Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product
      Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a cross product
      OK
      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Map 1 <- Map 3 (BROADCAST_EDGE)
              Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
              Reducer 5 <- Map 4 (SIMPLE_EDGE)
            DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: item
                        filterExpr: (true and i_item_id is not null) (type: boolean)
                        Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: i_item_id is not null (type: boolean)
                          Statistics: Num rows: 231000 Data size: 331931080 Basic stats: COMPLETE Column stats: NONE
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            condition expressions:
                              0 {i_item_sk} {i_item_id}
                              1 {d_date_sk}
                            keys:
                              0
                              1
                            outputColumnNames: _col0, _col1, _col25
                            input vertices:
                              1 Map 3
                            Statistics: Num rows: 254100 Data size: 365124192 Basic stats: COMPLETE Column stats: NONE
                            Select Operator
                              expressions: _col0 (type: int), _col1 (type: string), _col25 (type: int)
                              outputColumnNames: _col0, _col1, _col25
                              Statistics: Num rows: 254100 Data size: 365124192 Basic stats: COMPLETE Column stats: NONE
                              Reduce Output Operator
                                sort order:
                                Statistics: Num rows: 254100 Data size: 365124192 Basic stats: COMPLETE Column stats: NONE
                                value expressions: _col0 (type: int), _col1 (type: string), _col25 (type: int)
                  Execution mode: vectorized
              Map 2
                  Map Operator Tree:
                      TableScan
                        alias: i
                        filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean)
                        Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean)
                          Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE
                          Select Operator
                            expressions: i_item_id (type: string)
                            outputColumnNames: _col0
                            Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE
                            Group By Operator
                              keys: _col0 (type: string)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE
                              Reduce Output Operator
                                key expressions: _col0 (type: string)
                                sort order: +
                                Map-reduce partition columns: _col0 (type: string)
                                Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE
                  Execution mode: vectorized
              Map 3
                  Map Operator Tree:
                      TableScan
                        alias: date_dim
                        filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
                          Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            sort order:
                            Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                            value expressions: d_date_sk (type: int)
                  Execution mode: vectorized
              Map 4
                  Map Operator Tree:
                      TableScan
                        alias: store_sales
                        Statistics: Num rows: 82510879939 Data size: 7203833257964 Basic stats: COMPLETE Column stats: NONE
                        Map Join Operator
                          condition map:
                               Inner Join 0 to 1
                          condition expressions:
                            0 {ss_sold_date_sk} {ss_item_sk} {ss_ext_sales_price}
                            1 {_col0} {_col1} {_col25}
                          keys:
                            0
                            1
                          outputColumnNames: _col0, _col2, _col15, _col27, _col28, _col52
                          input vertices:
                            1 Map 1
                          Statistics: Num rows: 90761969664 Data size: 7924217282560 Basic stats: COMPLETE Column stats: NONE
                          Filter Operator
                            predicate: ((_col2 = _col27) and (_col0 = _col52)) (type: boolean)
                            Statistics: Num rows: 22690492416 Data size: 1981054320640 Basic stats: COMPLETE Column stats: NONE
                            Select Operator
                              expressions: _col15 (type: float), _col28 (type: string)
                              outputColumnNames: _col15, _col59
                              Statistics: Num rows: 22690492416 Data size: 1981054320640 Basic stats: COMPLETE Column stats: NONE
                              Map Join Operator
                                condition map:
                                     Left Semi Join 0 to 1
                                condition expressions:
                                  0 {_col15} {_col59}
                                  1
                                keys:
                                  0 _col59 (type: string)
                                  1 _col0 (type: string)
                                outputColumnNames: _col15, _col59
                                input vertices:
                                  1 Map 2
                                Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE
                                Select Operator
                                  expressions: _col59 (type: string), _col15 (type: float)
                                  outputColumnNames: _col0, _col1
                                  Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE
                                  Group By Operator
                                    aggregations: sum(_col1)
                                    keys: _col0 (type: string)
                                    mode: hash
                                    outputColumnNames: _col0, _col1
                                    Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE
                                    Reduce Output Operator
                                      key expressions: _col0 (type: string)
                                      sort order: +
                                      Map-reduce partition columns: _col0 (type: string)
                                      Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE
                                      value expressions: _col1 (type: double)
                  Execution mode: vectorized
              Reducer 5
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: sum(VALUE._col0)
                      keys: KEY._col0 (type: string)
                      mode: mergepartial
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic stats: COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: string), _col1 (type: double)
                        outputColumnNames: _col0, _col1
                        Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic stats: COMPLETE Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic stats: COMPLETE Column stats: NONE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

      No cross product generated for this query

       select i_item_id,sum(ss_ext_sales_price) total_sales
       from
              store_sales,
              date_dim,
               item
      where i_color in ('purple','burlywood','indian')
       and     ss_item_sk              = i_item_sk
       and     ss_sold_date_sk         = d_date_sk
       and     d_year                  = 2001
       and     d_moy                   = 1
       group by i_item_id;
      
      

      Attachments

        1. HIVE-7985.patch
          44 kB
          Laljo John Pullokkaran
        2. HIVE-7985.1.patch
          44 kB
          Laljo John Pullokkaran
        3. HIVE-7985.2.patch
          44 kB
          Gunther Hagleitner

        Issue Links

          Activity

            People

              jpullokkaran Laljo John Pullokkaran
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: