Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9695

Redundant filter operator in reducer Vertex when CBO is disabled

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • Logical Optimizer
    • None

    Description

      There is a redundant filter operator in reducer Vertex when CBO is disabled.

      Query

      select 
              ss_item_sk, ss_ticket_number, ss_store_sk
          from
              store_sales a, store_returns b, store
          where
              a.ss_item_sk = b.sr_item_sk
                  and a.ss_ticket_number = b.sr_ticket_number 
                  and ss_sold_date_sk between 2450816 and 2451500
      			and sr_returned_date_sk between 2450816 and 2451500
      			and s_store_sk = ss_store_sk;
      

      Plan snippet

        Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6)) (type: boolean)
      

      Full plan with CBO disabled

      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 (SIMPLE_EDGE)
            DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: b
                        filterExpr: ((sr_item_sk is not null and sr_ticket_number is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: boolean)
                        Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                          Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
                            sort order: ++
                            Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int)
                            Statistics: Num rows: 706893063 Data size: 6498502768 Basic stats: COMPLETE Column stats: COMPLETE
                            value expressions: sr_returned_date_sk (type: int)
                  Execution mode: vectorized
              Map 3
                  Map Operator Tree:
                      TableScan
                        alias: store
                        filterExpr: s_store_sk is not null (type: boolean)
                        Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: s_store_sk is not null (type: boolean)
                          Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: s_store_sk (type: int)
                            sort order: +
                            Map-reduce partition columns: s_store_sk (type: int)
                            Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 4
                  Map Operator Tree:
                      TableScan
                        alias: a
                        filterExpr: (((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 AND 2451500) (type: boolean)
                        Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ((ss_item_sk is not null and ss_ticket_number is not null) and ss_store_sk is not null) (type: boolean)
                          Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
                            sort order: ++
                            Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int)
                            Statistics: Num rows: 8405840828 Data size: 110101408700 Basic stats: COMPLETE Column stats: COMPLETE
                            value expressions: ss_store_sk (type: int), ss_sold_date_sk (type: int)
                  Execution mode: vectorized
              Reducer 2
                  Reduce Operator Tree:
                    Merge Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {KEY.reducesinkkey0} {VALUE._col5} {KEY.reducesinkkey1} {VALUE._col20}
                        1 {KEY.reducesinkkey0} {KEY.reducesinkkey1} {VALUE._col17}
                      outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45
                      Statistics: Num rows: 57439343 Data size: 1148786860 Basic stats: COMPLETE Column stats: COMPLETE
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        condition expressions:
                          0 {_col1} {_col6} {_col8} {_col22} {_col27} {_col34} {_col45}
                          1 {s_store_sk}
                        keys:
                          0 _col6 (type: int)
                          1 s_store_sk (type: int)
                        outputColumnNames: _col1, _col6, _col8, _col22, _col27, _col34, _col45, _col49
                        input vertices:
                          1 Map 3
                        Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (((((_col1 = _col27) and (_col8 = _col34)) and _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) and (_col49 = _col6)) (type: boolean)
                          Statistics: Num rows: 1794979 Data size: 57439328 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
                            outputColumnNames: _col0, _col1, _col2
                            Statistics: Num rows: 1794979 Data size: 21539748 Basic stats: COMPLETE Column stats: COMPLETE
                            File Output Operator
                              compressed: false
                              Statistics: Num rows: 1794979 Data size: 21539748 Basic stats: COMPLETE Column stats: COMPLETE
                              table:
                                  input format: org.apache.hadoop.mapred.TextInputFormat
                                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

      Full plan with CBO enabled

      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Map 4 <- Map 1 (BROADCAST_EDGE)
              Reducer 3 <- Map 2 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
            DagName: mmokhtar_20150214182525_63a9838f-db9f-40e9-8ae1-77c77143dccf:12
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: store
                        filterExpr: s_store_sk is not null (type: boolean)
                        Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: s_store_sk is not null (type: boolean)
                          Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: s_store_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 1704 Data size: 6816 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 2
                  Map Operator Tree:
                      TableScan
                        alias: b
                        filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                        Statistics: Num rows: 2370038095 Data size: 170506118656 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                          Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int), _col1 (type: int)
                              sort order: ++
                              Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
                              Statistics: Num rows: 706893063 Data size: 3670930516 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 4
                  Map Operator Tree:
                      TableScan
                        alias: a
                        filterExpr: ((ss_store_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) (type: boolean)
                        Statistics: Num rows: 28878719387 Data size: 2405805439460 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ((ss_store_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) (type: boolean)
                          Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: ss_item_sk (type: int), ss_store_sk (type: int), ss_ticket_number (type: int)
                            outputColumnNames: _col0, _col1, _col2
                            Statistics: Num rows: 8405840828 Data size: 76478045388 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col0} {_col1} {_col2}
                                1
                              keys:
                                0 _col1 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col0, _col1, _col2
                              input vertices:
                                1 Map 1
                              Statistics: Num rows: 8405840896 Data size: 100870090752 Basic stats: COMPLETE Column stats: COMPLETE
                              Reduce Output Operator
                                key expressions: _col0 (type: int), _col2 (type: int)
                                sort order: ++
                                Map-reduce partition columns: _col0 (type: int), _col2 (type: int)
                                Statistics: Num rows: 8405840896 Data size: 100870090752 Basic stats: COMPLETE Column stats: COMPLETE
                                value expressions: _col1 (type: int)
                  Execution mode: vectorized
              Reducer 3
                  Reduce Operator Tree:
                    Merge Join Operator
                      condition map:
                           Inner Join 0 to 1
                      condition expressions:
                        0 {KEY.reducesinkkey0} {VALUE._col0} {KEY.reducesinkkey1}
                        1
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: int), _col2 (type: int), _col1 (type: int)
                        outputColumnNames: _col0, _col1, _col2
                        Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 75912751 Data size: 910953012 Basic stats: COMPLETE Column stats: COMPLETE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

      Attachments

        1. HIVE-9695.01.patch
          101 kB
          jcamachorodriguez
        2. HIVE-9695.01.patch
          101 kB
          jcamachorodriguez
        3. HIVE-9695.patch
          18 kB
          jcamachorodriguez

        Issue Links

          Activity

            People

              jcamacho Jesús Camacho Rodríguez
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: