Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5369 Annotate hive operator tree with statistics from metastore
  3. HIVE-8168

With dynamic partition enabled fact table selectivity is not taken into account when generating the physical plan (Use CBO cardinality using physical plan generation)

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.14.0
    • 0.14.0
    • Tez
    • hive

    Description

      When calculating estimate row counts & data size during physical plan generation in StatsRulesProcFactory doesn't know that there will be dynamic partition pruning and it is hard to know how many partitions will qualify at runtime, as a result with Dynamic partition pruning enabled a query 32 can run with 570 compared to 70 tasks with dynamic partition pruning disabled and actual partition filters on the fact table.

      The long term solution for this issue is to use the cardinality estimates from CBO as it takes into account join selectivity and such, estimate from CBO won't address the number of the tasks used for the partitioned table but they will address the incorrect number of tasks used for the concequent reducers where the majority of the slowdown is coming from.

      Plan dynamic partition pruning on

         Map 5 
                  Map Operator Tree:
                      TableScan
                        alias: ss
                        filterExpr: ss_store_sk is not null (type: boolean)
                        Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: ss_store_sk is not null (type: boolean)
                          Statistics: Num rows: 275038277 Data size: 23685009448 Basic stats: COMPLETE Column stats: NONE
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            condition expressions:
                              0 {ss_store_sk} {ss_net_profit}
                              1 
                            keys:
                              0 ss_sold_date_sk (type: int)
                              1 d_date_sk (type: int)
                            outputColumnNames: _col6, _col21
                            input vertices:
                              1 Map 1
                            Statistics: Num rows: 302542112 Data size: 26053511168 Basic stats: COMPLETE Column stats: NONE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col21}
                                1 {s_county} {s_state}
                              keys:
                                0 _col6 (type: int)
                                1 s_store_sk (type: int)
                              outputColumnNames: _col21, _col80, _col81
                              input vertices:
                                1 Map 2
                              Statistics: Num rows: 332796320 Data size: 28658862080 Basic stats: COMPLETE Column stats: NONE
                              Map Join Operator
                                condition map:
                                     Left Semi Join 0 to 1
                                condition expressions:
                                  0 {_col21} {_col80} {_col81}
                                  1 
                                keys:
                                  0 _col81 (type: string)
                                  1 _col0 (type: string)
                                outputColumnNames: _col21, _col80, _col81
                                input vertices:
                                  1 Reducer 11
                                Statistics: Num rows: 366075968 Data size: 31524749312 Basic stats: COMPLETE Column stats: NONE
                                Select Operator
                                  expressions: _col81 (type: string), _col80 (type: string), _col21 (type: float)
                                  outputColumnNames: _col81, _col80, _col21
                                  Statistics: Num rows: 366075968 Data size: 31524749312 Basic stats: COMPLETE Column stats: NONE
                                  Group By Operator
                                    aggregations: sum(_col21)
                                    keys: _col81 (type: string), _col80 (type: string), '0' (type: string)
                                    mode: hash
                                    outputColumnNames: _col0, _col1, _col2, _col3
                                    Statistics: Num rows: 1098227904 Data size: 94574247936 Basic stats: COMPLETE Column stats: NONE
                                    Reduce Output Operator
                                      key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                                      sort order: +++
                                      Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                                      Statistics: Num rows: 1098227904 Data size: 94574247936 Basic stats: COMPLETE Column stats: NONE
                                      value expressions: _col3 (type: double)
      

      Plan snippet with partition pruning off and explicit partition filters

          Map 5 
                  Map Operator Tree:
                      TableScan
                        alias: ss
                        filterExpr: ((ss_sold_date_sk is not null and ss_store_sk is not null) and ss_sold_date BETWEEN '1999-06-01' AND '2000-05-31') (type: boolean)
                        Statistics: Num rows: 110339135 Data size: 4817453454 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: (ss_sold_date_sk is not null and ss_store_sk is not null) (type: boolean)
                          Statistics: Num rows: 27584784 Data size: 1204363374 Basic stats: COMPLETE Column stats: NONE
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            condition expressions:
                              0 {ss_store_sk} {ss_net_profit}
                              1 
                            keys:
                              0 ss_sold_date_sk (type: int)
                              1 d_date_sk (type: int)
                            outputColumnNames: _col7, _col22
                            input vertices:
                              1 Map 1
                            Statistics: Num rows: 30343264 Data size: 1324799744 Basic stats: COMPLETE Column stats: NONE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col22}
                                1 {s_county} {s_state}
                              keys:
                                0 _col7 (type: int)
                                1 s_store_sk (type: int)
                              outputColumnNames: _col22, _col81, _col82
                              input vertices:
                                1 Map 2
                              Statistics: Num rows: 33377592 Data size: 1457279744 Basic stats: COMPLETE Column stats: NONE
                              Map Join Operator
                                condition map:
                                     Left Semi Join 0 to 1
                                condition expressions:
                                  0 {_col22} {_col81} {_col82}
                                  1 
                                keys:
                                  0 _col82 (type: string)
                                  1 _col0 (type: string)
                                outputColumnNames: _col22, _col81, _col82
                                input vertices:
                                  1 Reducer 11
                                Statistics: Num rows: 36715352 Data size: 1603007744 Basic stats: COMPLETE Column stats: NONE
                                Select Operator
                                  expressions: _col82 (type: string), _col81 (type: string), _col22 (type: float)
                                  outputColumnNames: _col82, _col81, _col22
                                  Statistics: Num rows: 36715352 Data size: 1603007744 Basic stats: COMPLETE Column stats: NONE
                                  Group By Operator
                                    aggregations: sum(_col22)
                                    keys: _col82 (type: string), _col81 (type: string), '0' (type: string)
                                    mode: hash
                                    outputColumnNames: _col0, _col1, _col2, _col3
                                    Statistics: Num rows: 36715352 Data size: 1603007744 Basic stats: COMPLETE Column stats: NONE
                                    Reduce Output Operator
                                      key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                                      sort order: +++
                                      Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                                      Statistics: Num rows: 36715352 Data size: 1603007744 Basic stats: COMPLETE Column stats: NONE
                                      value expressions: _col3 (type: double)
      

      Plan snippet with partition pruning on and fetch column stats also on , in this case row estimate a

            Map 5 
                  Map Operator Tree:
                      TableScan
                        alias: ss
                        filterExpr: ss_store_sk is not null (type: boolean)
                        Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ss_store_sk is not null (type: boolean)
                          Statistics: Num rows: 537120379 Data size: 4195767284 Basic stats: COMPLETE Column stats: COMPLETE
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            condition expressions:
                              0 {ss_store_sk} {ss_net_profit}
                              1 
                            keys:
                              0 ss_sold_date_sk (type: int)
                              1 d_date_sk (type: int)
                            outputColumnNames: _col6, _col21
                            input vertices:
                              1 Map 1
                            Statistics: Num rows: 36524 Data size: 292192 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col21}
                                1 {s_county} {s_state}
                              keys:
                                0 _col6 (type: int)
                                1 s_store_sk (type: int)
                              outputColumnNames: _col21, _col80, _col81
                              input vertices:
                                1 Map 2
                              Statistics: Num rows: 45017 Data size: 8508213 Basic stats: COMPLETE Column stats: COMPLETE
                              Map Join Operator
                                condition map:
                                     Left Semi Join 0 to 1
                                condition expressions:
                                  0 {_col21} {_col80} {_col81}
                                  1 
                                keys:
                                  0 _col81 (type: string)
                                  1 _col0 (type: string)
                                outputColumnNames: _col21, _col80, _col81
                                input vertices:
                                  1 Reducer 11
                                Statistics: Num rows: 49518 Data size: 9359035 Basic stats: COMPLETE Column stats: NONE
                                Select Operator
                                  expressions: _col81 (type: string), _col80 (type: string), _col21 (type: float)
                                  outputColumnNames: _col81, _col80, _col21
                                  Statistics: Num rows: 49518 Data size: 9359035 Basic stats: COMPLETE Column stats: NONE
                                  Group By Operator
                                    aggregations: sum(_col21)
                                    keys: _col81 (type: string), _col80 (type: string), '0' (type: string)
                                    mode: hash
                                    outputColumnNames: _col0, _col1, _col2, _col3
                                    Statistics: Num rows: 148554 Data size: 28077105 Basic stats: COMPLETE Column stats: NONE
                                    Reduce Output Operator
                                      key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                                      sort order: +++
                                      Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string)
                                      Statistics: Num rows: 148554 Data size: 28077105 Basic stats: COMPLETE Column stats: NONE
                                      value expressions: _col3 (type: double)
      

      Attachments

        1. HIVE-8168.1.patch
          99 kB
          Prasanth Jayachandran
        2. HIVE-8168.2.patch
          77 kB
          Prasanth Jayachandran
        3. HIVE-8168.3.patch
          84 kB
          Prasanth Jayachandran
        4. HIVE-8168.4.patch
          87 kB
          Prasanth Jayachandran
        5. HIVE-8168.5.patch
          72 kB
          Prasanth Jayachandran

        Issue Links

          Activity

            People

              prasanth_j Prasanth Jayachandran
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: