Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9604

CBO : Presence of hybrid join condition sets of join order optimizations

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.14.0
    • None
    • CBO
    • None

    Description

      When a query has a join between two tables on hybrid join condition (conjunction + disjunction) the resulting join order from CBO is suboptimal.
      Re-writing the query results in 9x performance improvement.

      This was observed in several TPC-DS queries like Q72 and Q64.

      The culprit join conditions are :

      catalog_sales.cs_item_sk = inventory.inv_item_sk
        and inv_quantity_on_hand < cs_quantity 
      

      This is a simplified version of Q72.

      select  count(*) total_cnt
      from catalog_sales
      join inventory on (catalog_sales.cs_item_sk = inventory.inv_item_sk)
      join warehouse on (warehouse.w_warehouse_sk=inventory.inv_warehouse_sk)
      join item on (item.i_item_sk = catalog_sales.cs_item_sk)
      join customer_demographics on (catalog_sales.cs_bill_cdemo_sk = customer_demographics.cd_demo_sk)
      join household_demographics on (catalog_sales.cs_bill_hdemo_sk = household_demographics.hd_demo_sk)
      join date_dim d1 on (catalog_sales.cs_sold_date_sk = d1.d_date_sk)
      join date_dim d2 on (inventory.inv_date_sk = d2.d_date_sk)
      join date_dim d3 on (catalog_sales.cs_ship_date_sk = d3.d_date_sk)
      where d1.d_week_seq = d2.d_week_seq
        and inv_quantity_on_hand < cs_quantity 
        and d3.d_date > d1.d_date + 5
        and hd_buy_potential = '1001-5000'
        and d1.d_year = 2001
        and hd_buy_potential = '1001-5000'
        and cd_marital_status = 'M'
        and d1.d_year = 2001;
      

      If the inventory table is moved down in the join order a more efficient plan is generated

      Modified query

      select  count(*) total_cnt
      from catalog_sales
      join item on (item.i_item_sk = catalog_sales.cs_item_sk)
      join customer_demographics on (catalog_sales.cs_bill_cdemo_sk = customer_demographics.cd_demo_sk)
      join household_demographics on (catalog_sales.cs_bill_hdemo_sk = household_demographics.hd_demo_sk)
      join date_dim d1 on (catalog_sales.cs_sold_date_sk = d1.d_date_sk)
      join date_dim d3 on (catalog_sales.cs_ship_date_sk = d3.d_date_sk)
      join inventory on (catalog_sales.cs_item_sk = inventory.inv_item_sk)
      join warehouse on (warehouse.w_warehouse_sk=inventory.inv_warehouse_sk)
      join date_dim d2 on (inventory.inv_date_sk = d2.d_date_sk)
      where d1.d_week_seq = d2.d_week_seq
        and inv_quantity_on_hand < cs_quantity 
        and d3.d_date > d1.d_date + 5
        and hd_buy_potential = '1001-5000'
        and d1.d_year = 2001
        and hd_buy_potential = '1001-5000'
        and cd_marital_status = 'M'
        and d1.d_year = 2001;
      

      Plan with base query notice how catalog_sales joins with inventory first

      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Map 3 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Map 13 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)
              Reducer 4 <- Map 3 (SIMPLE_EDGE)
              Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
            DagName: mmokhtar_20141015151414_a08eae06-7250-4833-9e1d-8e58eb69780e:1
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: d1
                        filterExpr: (d_date_sk is not null and d_week_seq is not null) (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (d_date_sk is not null and d_week_seq is not null) (type: boolean)
                          Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int), d_week_seq (type: int)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int), _col1 (type: int)
                              sort order: ++
                              Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
                              Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE
                            Select Operator
                              expressions: _col0 (type: int)
                              outputColumnNames: _col0
                              Statistics: Num rows: 73049 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                              Group By Operator
                                keys: _col0 (type: int)
                                mode: hash
                                outputColumnNames: _col0
                                Statistics: Num rows: 73049 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                Dynamic Partitioning Event Operator
                                  Target Input: inventory
                                  Partition key expr: inv_date_sk
                                  Statistics: Num rows: 73049 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                  Target column: inv_date_sk
                                  Target Vertex: Map 7
                  Execution mode: vectorized
              Map 10 
                  Map Operator Tree:
                      TableScan
                        alias: catalog_returns
                        Statistics: Num rows: 28798881 Data size: 2942039156 Basic stats: COMPLETE Column stats: COMPLETE
                        Select Operator
                          expressions: cr_item_sk (type: int), cr_order_number (type: int)
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 28798881 Data size: 230391048 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: _col0 (type: int), _col1 (type: int)
                            sort order: ++
                            Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
                            Statistics: Num rows: 28798881 Data size: 230391048 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 11 
                  Map Operator Tree:
                      TableScan
                        alias: customer_demographics
                        filterExpr: ((cd_marital_status = 'M') and cd_demo_sk is not null) (type: boolean)
                        Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ((cd_marital_status = 'M') and cd_demo_sk is not null) (type: boolean)
                          Statistics: Num rows: 274400 Data size: 24421600 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: cd_demo_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 274400 Data size: 1097600 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 274400 Data size: 1097600 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 12 
                  Map Operator Tree:
                      TableScan
                        alias: promotion
                        Statistics: Num rows: 450 Data size: 530848 Basic stats: COMPLETE Column stats: COMPLETE
                        Select Operator
                          expressions: p_promo_sk (type: int)
                          outputColumnNames: _col0
                          Statistics: Num rows: 450 Data size: 1800 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 450 Data size: 1800 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 13 
                  Map Operator Tree:
                      TableScan
                        alias: household_demographics
                        filterExpr: ((hd_buy_potential = '1001-5000') and hd_demo_sk is not null) (type: boolean)
                        Statistics: Num rows: 7200 Data size: 770400 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ((hd_buy_potential = '1001-5000') and hd_demo_sk is not null) (type: boolean)
                          Statistics: Num rows: 1440 Data size: 138240 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: hd_demo_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 1440 Data size: 5760 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 1440 Data size: 5760 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 2 
                  Map Operator Tree:
                      TableScan
                        alias: d1
                        filterExpr: d_date_sk is not null (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: d_date_sk is not null (type: boolean)
                          Statistics: Num rows: 73049 Data size: 7158802 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int), d_date (type: string)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 73049 Data size: 7158802 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 73049 Data size: 7158802 Basic stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col1 (type: string)
                  Execution mode: vectorized
              Map 3 
                  Map Operator Tree:
                      TableScan
                        alias: catalog_sales
                        filterExpr: (((cs_item_sk is not null and cs_bill_hdemo_sk is not null) and cs_bill_cdemo_sk is not null) and cs_ship_date_sk is not null) (type: boolean)
                        Statistics: Num rows: 286549727 Data size: 37743959324 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (((cs_item_sk is not null and cs_bill_hdemo_sk is not null) and cs_bill_cdemo_sk is not null) and cs_ship_date_sk is not null) (type: boolean)
                          Statistics: Num rows: 284396955 Data size: 7948828532 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: cs_ship_date_sk (type: int), cs_bill_cdemo_sk (type: int), cs_bill_hdemo_sk (type: int), cs_item_sk (type: int), cs_promo_sk (type: int), cs_order_number (type: int), cs_quantity (type: int), cs_sold_date_sk (type: int)
                            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                            Statistics: Num rows: 284396955 Data size: 9086416352 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col0} {_col1} {_col2} {_col3} {_col4} {_col5} {_col6} {_col7}
                                1 {_col1} {_col2} {_col3}
                              keys:
                                0 _col3 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col9, _col10, _col11
                              input vertices:
                                1 Map 7
                              Statistics: Num rows: 275157677926 Data size: 8805045693632 Basic stats: COMPLETE Column stats: COMPLETE
                              Filter Operator
                                predicate: (_col10 < _col6) (type: boolean)
                                Statistics: Num rows: 91719225975 Data size: 2935015231200 Basic stats: COMPLETE Column stats: COMPLETE
                                Select Operator
                                  expressions: _col0 (type: int), _col1 (type: int), _col11 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col7 (type: int), _col9 (type: int)
                                  outputColumnNames: _col0, _col1, _col11, _col2, _col3, _col4, _col5, _col7, _col9
                                  Statistics: Num rows: 91719225975 Data size: 2201261423400 Basic stats: COMPLETE Column stats: COMPLETE
                                  Map Join Operator
                                    condition map:
                                         Inner Join 0 to 1
                                    condition expressions:
                                      0 {_col0} {_col1} {_col2} {_col3} {_col4} {_col5} {_col9} {_col11}
                                      1 {_col1} {_col2}
                                    keys:
                                      0 _col7 (type: int)
                                      1 _col0 (type: int)
                                    outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col9, _col11, _col13, _col14
                                    input vertices:
                                      1 Map 8
                                    Statistics: Num rows: 102398861876 Data size: 12083065701368 Basic stats: COMPLETE Column stats: COMPLETE
                                    Map Join Operator
                                      condition map:
                                           Inner Join 0 to 1
                                      condition expressions:
                                        0 {_col0} {_col1} {_col2} {_col3} {_col4} {_col5} {_col9} {_col13} {_col14}
                                        1 
                                      keys:
                                        0 _col11 (type: int), _col14 (type: int)
                                        1 _col0 (type: int), _col1 (type: int)
                                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col9, _col13, _col14
                                      input vertices:
                                        1 Map 1
                                      Statistics: Num rows: 8705458 Data size: 992422212 Basic stats: COMPLETE Column stats: COMPLETE
                                      Map Join Operator
                                        condition map:
                                             Inner Join 0 to 1
                                        condition expressions:
                                          0 {_col0} {_col1} {_col3} {_col4} {_col5} {_col9} {_col13} {_col14}
                                          1 
                                        keys:
                                          0 _col2 (type: int)
                                          1 _col0 (type: int)
                                        outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col9, _col13, _col14
                                        input vertices:
                                          1 Map 13
                                        Statistics: Num rows: 6739709 Data size: 768326826 Basic stats: COMPLETE Column stats: COMPLETE
                                        Select Operator
                                          expressions: _col0 (type: int), _col1 (type: int), _col13 (type: string), _col14 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col9 (type: int)
                                          outputColumnNames: _col0, _col1, _col13, _col14, _col3, _col4, _col5, _col9
                                          Statistics: Num rows: 6739709 Data size: 768326826 Basic stats: COMPLETE Column stats: COMPLETE
                                          Map Join Operator
                                            condition map:
                                                 Inner Join 0 to 1
                                            condition expressions:
                                              0 
                                              1 {_col0} {_col3} {_col4} {_col5} {_col9} {_col13} {_col14}
                                            keys:
                                              0 _col0 (type: int)
                                              1 _col1 (type: int)
                                            outputColumnNames: _col2, _col5, _col6, _col7, _col11, _col15, _col16
                                            input vertices:
                                              0 Map 11
                                            Statistics: Num rows: 7051609 Data size: 803883426 Basic stats: COMPLETE Column stats: COMPLETE
                                            Select Operator
                                              expressions: _col11 (type: int), _col15 (type: string), _col16 (type: int), _col2 (type: int), _col5 (type: int), _col6 (type: int), _col7 (type: int)
                                              outputColumnNames: _col11, _col15, _col16, _col2, _col5, _col6, _col7
                                              Statistics: Num rows: 7051609 Data size: 803883426 Basic stats: COMPLETE Column stats: COMPLETE
                                              Map Join Operator
                                                condition map:
                                                     Inner Join 0 to 1
                                                condition expressions:
                                                  0 {_col1}
                                                  1 {_col5} {_col6} {_col7} {_col11} {_col15} {_col16}
                                                keys:
                                                  0 _col0 (type: int)
                                                  1 _col2 (type: int)
                                                outputColumnNames: _col1, _col7, _col8, _col9, _col13, _col17, _col18
                                                input vertices:
                                                  0 Map 2
                                                Statistics: Num rows: 7884543 Data size: 1639984944 Basic stats: COMPLETE Column stats: COMPLETE
                                                Filter Operator
                                                  predicate: (UDFToDouble(_col1) > (UDFToDouble(_col17) + UDFToDouble(5))) (type: boolean)
                                                  Statistics: Num rows: 2628181 Data size: 546661648 Basic stats: COMPLETE Column stats: COMPLETE
                                                  Select Operator
                                                    expressions: _col13 (type: int), _col18 (type: int), _col7 (type: int), _col8 (type: int), _col9 (type: int)
                                                    outputColumnNames: _col13, _col18, _col7, _col8, _col9
                                                    Statistics: Num rows: 2628181 Data size: 52563620 Basic stats: COMPLETE Column stats: COMPLETE
                                                    Map Join Operator
                                                      condition map:
                                                           Inner Join 0 to 1
                                                      condition expressions:
                                                        0 {_col1}
                                                        1 {_col7} {_col8} {_col9} {_col13} {_col18}
                                                      keys:
                                                        0 _col0 (type: int)
                                                        1 _col7 (type: int)
                                                      outputColumnNames: _col1, _col9, _col10, _col11, _col15, _col20
                                                      input vertices:
                                                        0 Map 6
                                                      Statistics: Num rows: 29222304 Data size: 5961350016 Basic stats: COMPLETE Column stats: COMPLETE
                                                      Map Join Operator
                                                        condition map:
                                                             Inner Join 0 to 1
                                                        condition expressions:
                                                          0 {_col1} {_col9} {_col10} {_col11} {_col20}
                                                          1 {_col1}
                                                        keys:
                                                          0 _col15 (type: int)
                                                          1 _col0 (type: int)
                                                        outputColumnNames: _col1, _col9, _col10, _col11, _col20, _col27
                                                        input vertices:
                                                          1 Map 9
                                                        Statistics: Num rows: 35066764 Data size: 10484962436 Basic stats: COMPLETE Column stats: COMPLETE
                                                        Select Operator
                                                          expressions: _col27 (type: string), _col1 (type: string), _col20 (type: int), _col9 (type: int), _col10 (type: int), _col11 (type: int)
                                                          outputColumnNames: _col13, _col15, _col22, _col3, _col4, _col5
                                                          Statistics: Num rows: 35066764 Data size: 10484962436 Basic stats: COMPLETE Column stats: COMPLETE
                                                          Map Join Operator
                                                            condition map:
                                                                 Right Outer Join0 to 1
                                                            condition expressions:
                                                              0 {_col0}
                                                              1 {_col3} {_col5} {_col13} {_col15} {_col22}
                                                            keys:
                                                              0 _col0 (type: int)
                                                              1 _col4 (type: int)
                                                            outputColumnNames: _col0, _col4, _col6, _col14, _col16, _col23
                                                            input vertices:
                                                              0 Map 12
                                                            Statistics: Num rows: 631201752 Data size: 188729323848 Basic stats: COMPLETE Column stats: COMPLETE
                                                            Select Operator
                                                              expressions: _col0 (type: int), _col14 (type: string), _col16 (type: string), _col23 (type: int), _col4 (type: int), _col6 (type: int)
                                                              outputColumnNames: _col0, _col14, _col16, _col23, _col4, _col6
                                                              Statistics: Num rows: 631201752 Data size: 188729323848 Basic stats: COMPLETE Column stats: COMPLETE
                                                              Map Join Operator
                                                                condition map:
                                                                     Right Outer Join0 to 1
                                                                condition expressions:
                                                                  0 
                                                                  1 {_col0} {_col14} {_col16} {_col23}
                                                                keys:
                                                                  0 _col0 (type: int), _col1 (type: int)
                                                                  1 _col4 (type: int), _col6 (type: int)
                                                                outputColumnNames: _col2, _col16, _col18, _col25
                                                                input vertices:
                                                                  0 Map 10
                                                                Statistics: Num rows: 610256859 Data size: 177584745969 Basic stats: COMPLETE Column stats: COMPLETE
                                                                Select Operator
                                                                  expressions: _col18 (type: string), _col16 (type: string), _col25 (type: int), CASE WHEN (_col2 is null) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col2 is not null) THEN (1) ELSE (0) END (type: int)
                                                                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                                                                  Statistics: Num rows: 610256859 Data size: 177584745969 Basic stats: COMPLETE Column stats: COMPLETE
                                                                  Group By Operator
                                                                    aggregations: count(_col3), count(_col4), count()
                                                                    keys: _col0 (type: string), _col1 (type: string), _col2 (type: int)
                                                                    mode: hash
                                                                    outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                                                                    Statistics: Num rows: 44400 Data size: 1243200 Basic stats: COMPLETE Column stats: COMPLETE
                                                                    Reduce Output Operator
                                                                      key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: int)
                                                                      sort order: +++
                                                                      Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: int)
                                                                      Statistics: Num rows: 44400 Data size: 1243200 Basic stats: COMPLETE Column stats: COMPLETE
                                                                      value expressions: _col3 (type: bigint), _col4 (type: bigint), _col5 (type: bigint)
                  Execution mode: vectorized
              Map 6 
                  Map Operator Tree:
                      TableScan
                        alias: item
                        filterExpr: i_item_sk is not null (type: boolean)
                        Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: i_item_sk is not null (type: boolean)
                          Statistics: Num rows: 48000 Data size: 9024000 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: i_item_sk (type: int), i_item_desc (type: string)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 48000 Data size: 9024000 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 48000 Data size: 9024000 Basic stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col1 (type: string)
                  Execution mode: vectorized
              Map 7 
                  Map Operator Tree:
                      TableScan
                        alias: inventory
                        filterExpr: (inv_item_sk is not null and inv_warehouse_sk is not null) (type: boolean)
                        Statistics: Num rows: 37584000 Data size: 443485104 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (inv_item_sk is not null and inv_warehouse_sk is not null) (type: boolean)
                          Statistics: Num rows: 37584000 Data size: 443485104 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: inv_item_sk (type: int), inv_warehouse_sk (type: int), inv_quantity_on_hand (type: int), inv_date_sk (type: int)
                            outputColumnNames: _col0, _col1, _col2, _col3
                            Statistics: Num rows: 37584000 Data size: 593821104 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 37584000 Data size: 593821104 Basic stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col1 (type: int), _col2 (type: int), _col3 (type: int)
                  Execution mode: vectorized
              Map 8 
                  Map Operator Tree:
                      TableScan
                        alias: d1
                        filterExpr: (((d_year = 2001) and d_date_sk is not null) and d_week_seq is not null) (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (((d_year = 2001) and d_date_sk is not null) and d_week_seq is not null) (type: boolean)
                          Statistics: Num rows: 652 Data size: 69112 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int), d_date (type: string), d_week_seq (type: int)
                            outputColumnNames: _col0, _col1, _col2
                            Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col1 (type: string), _col2 (type: int)
                            Select Operator
                              expressions: _col0 (type: int)
                              outputColumnNames: _col0
                              Statistics: Num rows: 652 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                              Group By Operator
                                keys: _col0 (type: int)
                                mode: hash
                                outputColumnNames: _col0
                                Statistics: Num rows: 652 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                Dynamic Partitioning Event Operator
                                  Target Input: catalog_sales
                                  Partition key expr: cs_sold_date_sk
                                  Statistics: Num rows: 652 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                  Target column: cs_sold_date_sk
                                  Target Vertex: Map 3
                  Execution mode: vectorized
              Map 9 
                  Map Operator Tree:
                      TableScan
                        alias: warehouse
                        filterExpr: w_warehouse_sk is not null (type: boolean)
                        Statistics: Num rows: 6 Data size: 6166 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: w_warehouse_sk is not null (type: boolean)
                          Statistics: Num rows: 6 Data size: 618 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: w_warehouse_sk (type: int), w_warehouse_name (type: string)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 6 Data size: 618 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 6 Data size: 618 Basic stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col1 (type: string)
                  Execution mode: vectorized
              Reducer 4 
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: count(VALUE._col0), count(VALUE._col1), count(VALUE._col2)
                      keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: int)
                      mode: mergepartial
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                      Statistics: Num rows: 44400 Data size: 2131200 Basic stats: COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: string), _col1 (type: string), _col2 (type: int), _col3 (type: bigint), _col4 (type: bigint), _col5 (type: bigint)
                        outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                        Statistics: Num rows: 44400 Data size: 2131200 Basic stats: COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col5 (type: bigint), _col0 (type: string), _col1 (type: string), _col2 (type: int)
                          sort order: -+++
                          Statistics: Num rows: 44400 Data size: 2131200 Basic stats: COMPLETE Column stats: COMPLETE
                          TopN Hash Memory Usage: 0.04
                          value expressions: _col3 (type: bigint), _col4 (type: bigint)
                  Execution mode: vectorized
              Reducer 5 
                  Reduce Operator Tree:
                    Select Operator
                      expressions: KEY.reducesinkkey1 (type: string), KEY.reducesinkkey2 (type: string), KEY.reducesinkkey3 (type: int), VALUE._col0 (type: bigint), VALUE._col1 (type: bigint), KEY.reducesinkkey0 (type: bigint)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                      Statistics: Num rows: 44400 Data size: 2131200 Basic stats: COMPLETE Column stats: COMPLETE
                      Limit
                        Number of rows: 100
                        Statistics: Num rows: 100 Data size: 4800 Basic stats: COMPLETE Column stats: COMPLETE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 100 Data size: 4800 Basic stats: COMPLETE Column stats: COMPLETE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  Execution mode: vectorized
      
        Stage: Stage-0
          Fetch Operator
            limit: 100
            Processor Tree:
              ListSink
      
      Time taken: 16.504 seconds, Fetched: 463 row(s)
      
      

      Plan with modified query

      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Map 2 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE)
              Map 7 <- Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)
              Reducer 3 <- Map 2 (SIMPLE_EDGE)
            DagName: mmokhtar_20150206180606_2f891814-bf41-4888-ab80-dddf4425205f:1
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: d1
                        filterExpr: (d_date_sk is not null and d_week_seq is not null) (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (d_date_sk is not null and d_week_seq is not null) (type: boolean)
                          Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int), d_week_seq (type: int)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int), _col1 (type: int)
                              sort order: ++
                              Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
                              Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE
                            Select Operator
                              expressions: _col0 (type: int)
                              outputColumnNames: _col0
                              Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE
                              Group By Operator
                                keys: _col0 (type: int)
                                mode: hash
                                outputColumnNames: _col0
                                Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE
                                Dynamic Partitioning Event Operator
                                  Target Input: inventory
                                  Partition key expr: inv_date_sk
                                  Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE
                                  Target column: inv_date_sk
                                  Target Vertex: Map 2
              Map 10
                  Map Operator Tree:
                      TableScan
                        alias: warehouse
                        filterExpr: w_warehouse_sk is not null (type: boolean)
                        Statistics: Num rows: 6 Data size: 6166 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: w_warehouse_sk is not null (type: boolean)
                          Statistics: Num rows: 6 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: w_warehouse_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 6 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 6 Data size: 24 Basic stats: COMPLETE Column stats: COMPLETE
              Map 2
                  Map Operator Tree:
                      TableScan
                        alias: inventory
                        filterExpr: (inv_item_sk is not null and inv_warehouse_sk is not null) (type: boolean)
                        Statistics: Num rows: 37584000 Data size: 443485104 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (inv_item_sk is not null and inv_warehouse_sk is not null) (type: boolean)
                          Statistics: Num rows: 37584000 Data size: 593821104 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: inv_item_sk (type: int), inv_warehouse_sk (type: int), inv_quantity_on_hand (type: int), inv_date_sk (type: int)
                            outputColumnNames: _col0, _col1, _col2, _col3
                            Statistics: Num rows: 37584000 Data size: 593821104 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              keys:
                                0 _col0 (type: int)
                                1 _col3 (type: int)
                              outputColumnNames: _col1, _col2, _col3, _col8, _col17
                              input vertices:
                                1 Map 7
                              Statistics: Num rows: 937245073 Data size: 14995921168 Basic stats: COMPLETE Column stats: COMPLETE
                              Filter Operator
                                predicate: (_col2 < _col8) (type: boolean)
                                Statistics: Num rows: 312415024 Data size: 4998640384 Basic stats: COMPLETE Column stats: COMPLETE
                                Select Operator
                                  expressions: _col17 (type: int), _col1 (type: int), _col3 (type: int)
                                  outputColumnNames: _col13, _col18, _col20
                                  Statistics: Num rows: 312415024 Data size: 2499320192 Basic stats: COMPLETE Column stats: COMPLETE
                                  Map Join Operator
                                    condition map:
                                         Inner Join 0 to 1
                                    keys:
                                      0 _col0 (type: int), _col1 (type: int)
                                      1 _col20 (type: int), _col13 (type: int)
                                    outputColumnNames: _col20
                                    input vertices:
                                      0 Map 1
                                    Statistics: Num rows: 26560 Data size: 106240 Basic stats: COMPLETE Column stats: COMPLETE
                                    Map Join Operator
                                      condition map:
                                           Inner Join 0 to 1
                                      keys:
                                        0 _col20 (type: int)
                                        1 _col0 (type: int)
                                      input vertices:
                                        1 Map 10
                                      Statistics: Num rows: 26560 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                      Group By Operator
                                        aggregations: count()
                                        mode: hash
                                        outputColumnNames: _col0
                                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                        Reduce Output Operator
                                          sort order:
                                          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                          value expressions: _col0 (type: bigint)
              Map 4
                  Map Operator Tree:
                      TableScan
                        alias: d1
                        filterExpr: d_date_sk is not null (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: d_date_sk is not null (type: boolean)
                          Statistics: Num rows: 73049 Data size: 7158802 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int), d_date (type: string)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 73049 Data size: 7158802 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 73049 Data size: 7158802 Basic stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col1 (type: string)
              Map 5
                  Map Operator Tree:
                      TableScan
                        alias: item
                        filterExpr: i_item_sk is not null (type: boolean)
                        Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: i_item_sk is not null (type: boolean)
                          Statistics: Num rows: 48000 Data size: 192000 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: i_item_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 48000 Data size: 192000 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 48000 Data size: 192000 Basic stats: COMPLETE Column stats: COMPLETE
              Map 6
                  Map Operator Tree:
                      TableScan
                        alias: customer_demographics
                        filterExpr: ((cd_marital_status = 'M') and cd_demo_sk is not null) (type: boolean)
                        Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ((cd_marital_status = 'M') and cd_demo_sk is not null) (type: boolean)
                          Statistics: Num rows: 274400 Data size: 24421600 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: cd_demo_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 274400 Data size: 1097600 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 274400 Data size: 1097600 Basic stats: COMPLETE Column stats: COMPLETE
              Map 7
                  Map Operator Tree:
                      TableScan
                        alias: catalog_sales
                        filterExpr: (((cs_bill_hdemo_sk is not null and cs_bill_cdemo_sk is not null) and cs_item_sk is not null) and cs_ship_date_sk is not null) (type: boolean)
                        Statistics: Num rows: 286549727 Data size: 37743959324 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (((cs_bill_hdemo_sk is not null and cs_bill_cdemo_sk is not null) and cs_item_sk is not null) and cs_ship_date_sk is not null) (type: boolean)
                          Statistics: Num rows: 284396955 Data size: 6814100200 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: cs_ship_date_sk (type: int), cs_bill_cdemo_sk (type: int), cs_bill_hdemo_sk (type: int), cs_item_sk (type: int), cs_quantity (type: int), cs_sold_date_sk (type: int)
                            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                            Statistics: Num rows: 284396955 Data size: 6814100200 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              keys:
                                0 _col5 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col7, _col8
                              input vertices:
                                1 Map 8
                              Statistics: Num rows: 101715202 Data size: 12002393836 Basic stats: COMPLETE Column stats: COMPLETE
                              Map Join Operator
                                condition map:
                                     Inner Join 0 to 1
                                keys:
                                  0 _col2 (type: int)
                                  1 _col0 (type: int)
                                outputColumnNames: _col0, _col1, _col3, _col4, _col7, _col8
                                input vertices:
                                  1 Map 9
                                Statistics: Num rows: 20343040 Data size: 2319106560 Basic stats: COMPLETE Column stats: COMPLETE
                                Map Join Operator
                                  condition map:
                                       Inner Join 0 to 1
                                  keys:
                                    0 _col0 (type: int)
                                    1 _col1 (type: int)
                                  outputColumnNames: _col2, _col5, _col6, _col9, _col10
                                  input vertices:
                                    0 Map 6
                                  Statistics: Num rows: 2906149 Data size: 34873788 Basic stats: COMPLETE Column stats: COMPLETE
                                  Select Operator
                                    expressions: _col10 (type: int), _col2 (type: int), _col5 (type: int), _col6 (type: int), _col9 (type: string)
                                    outputColumnNames: _col10, _col2, _col5, _col6, _col9
                                    Statistics: Num rows: 2906149 Data size: 34873788 Basic stats: COMPLETE Column stats: COMPLETE
                                    Map Join Operator
                                      condition map:
                                           Inner Join 0 to 1
                                      keys:
                                        0 _col0 (type: int)
                                        1 _col5 (type: int)
                                      outputColumnNames: _col3, _col6, _col7, _col10, _col11
                                      input vertices:
                                        0 Map 5
                                      Statistics: Num rows: 2906149 Data size: 34873788 Basic stats: COMPLETE Column stats: COMPLETE
                                      Select Operator
                                        expressions: _col10 (type: string), _col11 (type: int), _col3 (type: int), _col6 (type: int), _col7 (type: int)
                                        outputColumnNames: _col10, _col11, _col3, _col6, _col7
                                        Statistics: Num rows: 2906149 Data size: 34873788 Basic stats: COMPLETE Column stats: COMPLETE
                                        Map Join Operator
                                          condition map:
                                               Inner Join 0 to 1
                                          keys:
                                            0 _col0 (type: int)
                                            1 _col3 (type: int)
                                          outputColumnNames: _col1, _col8, _col9, _col12, _col13
                                          input vertices:
                                            0 Map 4
                                          Statistics: Num rows: 2906149 Data size: 296427198 Basic stats: COMPLETE Column stats: COMPLETE
                                          Filter Operator
                                            predicate: (UDFToDouble(_col1) > (UDFToDouble(_col12) + 5.0)) (type: boolean)
                                            Statistics: Num rows: 968716 Data size: 98809032 Basic stats: COMPLETE Column stats: COMPLETE
                                            Select Operator
                                              expressions: _col13 (type: int), _col8 (type: int), _col9 (type: int)
                                              outputColumnNames: _col13, _col3, _col4
                                              Statistics: Num rows: 968716 Data size: 7749728 Basic stats: COMPLETE Column stats: COMPLETE
                                              Reduce Output Operator
                                                key expressions: _col3 (type: int)
                                                sort order: +
                                                Map-reduce partition columns: _col3 (type: int)
                                                Statistics: Num rows: 968716 Data size: 7749728 Basic stats: COMPLETE Column stats: COMPLETE
                                                value expressions: _col4 (type: int), _col13 (type: int)
              Map 8
                  Map Operator Tree:
                      TableScan
                        alias: d1
                        filterExpr: (((d_year = 2001) and d_date_sk is not null) and d_week_seq is not null) (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (((d_year = 2001) and d_date_sk is not null) and d_week_seq is not null) (type: boolean)
                          Statistics: Num rows: 652 Data size: 69112 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int), d_date (type: string), d_week_seq (type: int)
                            outputColumnNames: _col0, _col1, _col2
                            Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col1 (type: string), _col2 (type: int)
                            Select Operator
                              expressions: _col0 (type: int)
                              outputColumnNames: _col0
                              Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE
                              Group By Operator
                                keys: _col0 (type: int)
                                mode: hash
                                outputColumnNames: _col0
                                Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE
                                Dynamic Partitioning Event Operator
                                  Target Input: catalog_sales
                                  Partition key expr: cs_sold_date_sk
                                  Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE
                                  Target column: cs_sold_date_sk
                                  Target Vertex: Map 7
              Map 9
                  Map Operator Tree:
                      TableScan
                        alias: household_demographics
                        filterExpr: ((hd_buy_potential = '1001-5000') and hd_demo_sk is not null) (type: boolean)
                        Statistics: Num rows: 7200 Data size: 770400 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ((hd_buy_potential = '1001-5000') and hd_demo_sk is not null) (type: boolean)
                          Statistics: Num rows: 1440 Data size: 138240 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: hd_demo_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 1440 Data size: 5760 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 1440 Data size: 5760 Basic stats: COMPLETE Column stats: COMPLETE
              Reducer 3
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: count(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                      File Output Operator
                        compressed: false
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                        table:
                            input format: org.apache.hadoop.mapred.TextInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

      Attachments

        Activity

          People

            jpullokkaran Laljo John Pullokkaran
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: