Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9068

Hive : With CBO disabled Vectorization in Map join disabled causing 100% increase in elapsed time and CPU (possibly due to redundant filter operator)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 0.14.0
    • 0.14.1
    • Vectorization
    • None

    Description

      With CBO off there is a redundant filter operator

       Filter Operator
                                predicate: ((null is null and (_col22 = _col51)) and (_col1 = _col26)) (type: boolean)
      

      Possibly this is why Vectorization is getting disabled with CBO off, this operator doesn't exist with CBO on.

      Query

      select 
          count(*)
      from
          (SELECT 
              'store' as channel,
                  'ss_addr_sk' col_name,
                  d_year,
                  d_qoy,
                  i_category,
                  ss_ext_sales_price ext_sales_price
          FROM
              store_sales, item, date_dim
          WHERE
              ss_addr_sk IS NULL
                  AND store_sales.ss_sold_date_sk = date_dim.d_date_sk
                  AND store_sales.ss_item_sk = item.i_item_sk) a;
      

      Explain with CBO OFF

      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Map 1 <- Map 3 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE)
              Reducer 2 <- Map 1 (SIMPLE_EDGE)
            DagName: mmokhtar_20141210171212_02c36f60-ceea-4e18-a266-5baecfd023f2:6
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: store_sales
                        filterExpr: (ss_item_sk is not null and ss_addr_sk is null) (type: boolean)
                        Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (ss_item_sk is not null and ss_addr_sk is null) (type: boolean)
                          Statistics: Num rows: 1946839900 Data size: 23178336456 Basic stats: COMPLETE Column stats: COMPLETE
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            condition expressions:
                              0 {ss_item_sk} {ss_sold_date_sk}
                              1 {i_item_sk}
                            keys:
                              0 ss_item_sk (type: int)
                              1 i_item_sk (type: int)
                            outputColumnNames: _col1, _col22, _col26
                            input vertices:
                              1 Map 4
                            Statistics: Num rows: 1946839936 Data size: 23362079232 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col1} {_col22} {_col26}
                                1 {d_date_sk}
                              keys:
                                0 _col22 (type: int)
                                1 d_date_sk (type: int)
                              outputColumnNames: _col1, _col22, _col26, _col51
                              input vertices:
                                1 Map 3
                              Statistics: Num rows: 2176800197 Data size: 34828803152 Basic stats: COMPLETE Column stats: COMPLETE
                              Filter Operator
                                predicate: ((null is null and (_col22 = _col51)) and (_col1 = _col26)) (type: boolean)
                                Statistics: Num rows: 272100024 Data size: 4353600384 Basic stats: COMPLETE Column stats: COMPLETE
                                Select Operator
                                  Statistics: Num rows: 272100024 Data size: 4353600384 Basic stats: COMPLETE Column stats: COMPLETE
                                  Group By Operator
                                    aggregations: count()
                                    mode: hash
                                    outputColumnNames: _col0
                                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                    Reduce Output Operator
                                      sort order:
                                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                      value expressions: _col0 (type: bigint)
              Map 3
                  Map Operator Tree:
                      TableScan
                        alias: date_dim
                        filterExpr: d_date_sk is not null (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: d_date_sk is not null (type: boolean)
                          Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: d_date_sk (type: int)
                            sort order: +
                            Map-reduce partition columns: d_date_sk (type: int)
                            Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                            Group By Operator
                              keys: _col0 (type: int)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
                              Dynamic Partitioning Event Operator
                                Target Input: store_sales
                                Partition key expr: ss_sold_date_sk
                                Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
                                Target column: ss_sold_date_sk
                                Target Vertex: Map 1
                  Execution mode: vectorized
              Map 4
                  Map Operator Tree:
                      TableScan
                        alias: item
                        filterExpr: i_item_sk is not null (type: boolean)
                        Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: i_item_sk is not null (type: boolean)
                          Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: i_item_sk (type: int)
                            sort order: +
                            Map-reduce partition columns: i_item_sk (type: int)
                            Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Reducer 2
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: count(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: bigint)
                        outputColumnNames: _col0
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

      Explain with CBO on

      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Map 1 <- Map 3 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE)
              Reducer 2 <- Map 1 (SIMPLE_EDGE)
            DagName: mmokhtar_20141210171212_495d0eb9-d176-43d3-8101-84821a0c0fdf:5
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: store_sales
                        filterExpr: (ss_addr_sk is null and ss_item_sk is not null) (type: boolean)
                        Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (ss_addr_sk is null and ss_item_sk is not null) (type: boolean)
                          Statistics: Num rows: 1946839900 Data size: 23178336456 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: ss_item_sk (type: int), ss_sold_date_sk (type: int)
                            outputColumnNames: _col0, _col2
                            Statistics: Num rows: 1946839900 Data size: 15574719200 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0
                                1 {_col2}
                              keys:
                                0 _col0 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col3
                              input vertices:
                                0 Map 4
                              Statistics: Num rows: 1946839936 Data size: 7787359744 Basic stats: COMPLETE Column stats: COMPLETE
                              Select Operator
                                expressions: _col3 (type: int)
                                outputColumnNames: _col3
                                Statistics: Num rows: 1946839936 Data size: 7787359744 Basic stats: COMPLETE Column stats: COMPLETE
                                Map Join Operator
                                  condition map:
                                       Inner Join 0 to 1
                                  condition expressions:
                                    0
                                    1
                                  keys:
                                    0 _col0 (type: int)
                                    1 _col3 (type: int)
                                  input vertices:
                                    0 Map 3
                                  Statistics: Num rows: 3232152511019 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                  Select Operator
                                    Statistics: Num rows: 3232152511019 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                    Group By Operator
                                      aggregations: count()
                                      mode: hash
                                      outputColumnNames: _col0
                                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                      Reduce Output Operator
                                        sort order:
                                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                                        value expressions: _col0 (type: bigint)
                  Execution mode: vectorized
              Map 3
                  Map Operator Tree:
                      TableScan
                        alias: date_dim
                        filterExpr: d_date_sk is not null (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: d_date_sk is not null (type: boolean)
                          Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                            Select Operator
                              expressions: _col0 (type: int)
                              outputColumnNames: _col0
                              Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                              Group By Operator
                                keys: _col0 (type: int)
                                mode: hash
                                outputColumnNames: _col0
                                Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
                                Dynamic Partitioning Event Operator
                                  Target Input: store_sales
                                  Partition key expr: ss_sold_date_sk
                                  Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
                                  Target column: ss_sold_date_sk
                                  Target Vertex: Map 1
                  Execution mode: vectorized
              Map 4
                  Map Operator Tree:
                      TableScan
                        alias: item
                        filterExpr: i_item_sk is not null (type: boolean)
                        Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: i_item_sk is not null (type: boolean)
                          Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: i_item_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Reducer 2
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: count(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: bigint)
                        outputColumnNames: _col0
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      
      Time taken: 3.874 seconds, Fetched: 144 row(s)
      

      Attachments

        Activity

          People

            mmccline Matt McCline
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: