Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10107

Union All : Vertex missing stats resulting in OOM and in-efficient plans

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.14.0
    • 1.2.1
    • Physical Optimizer
    • None

    Description

      Reducer Vertices sending data to a Union all edge are missing statistics and as a result we either use very few reducers in the UNION ALL edge or decide to broadcast the results of UNION ALL.

      Query

      select 
          count(*) rowcount
      from
          (select 
              ss_item_sk, ss_ticket_number, ss_store_sk
          from
              store_sales a, store_returns b
          where
              a.ss_item_sk = b.sr_item_sk
                  and a.ss_ticket_number = b.sr_ticket_number union all select 
              ss_item_sk, ss_ticket_number, ss_store_sk
          from
              store_sales c, store_returns d
          where
              c.ss_item_sk = d.sr_item_sk
                  and c.ss_ticket_number = d.sr_ticket_number) t
      group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number
      having rowcount > 100000000;
      

      Plan snippet

       Edges:
              Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS)
              Reducer 4 <- Union 3 (SIMPLE_EDGE)
              Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS)
      
        Reducer 4
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: count(VALUE._col0)
                      keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 (type: int)
                      mode: mergepartial
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                      Filter Operator
                        predicate: (_col3 > 100000000) (type: boolean)
                        Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
                        Select Operator
                          expressions: _col3 (type: bigint)
                          outputColumnNames: _col0
                          Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
                          File Output Operator
                            compressed: false
                            Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
                            table:
                                input format: org.apache.hadoop.mapred.TextInputFormat
                                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              Reducer 7
                  Reduce Operator Tree:
                    Merge Join Operator
                      condition map:
                           Inner Join 0 to 1
                      keys:
                        0 ss_item_sk (type: int), ss_ticket_number (type: int)
                        1 sr_item_sk (type: int), sr_ticket_number (type: int)
                      outputColumnNames: _col1, _col6, _col8, _col27, _col34
                      Filter Operator
                        predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
                        Select Operator
                          expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
                          outputColumnNames: _col0, _col1, _col2
                          Group By Operator
                            aggregations: count()
                            keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
                            mode: hash
                            outputColumnNames: _col0, _col1, _col2, _col3
                            Reduce Output Operator
                              key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int)
                              sort order: +++
                              Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int)
                              value expressions: _col3 (type: bigint)
      

      The full explain plan

      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS)
              Reducer 4 <- Union 3 (SIMPLE_EDGE)
              Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS)
            DagName: mmokhtar_20150214132727_95878ea1-ee6a-4b7e-bc86-843abd5cf664:7
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: a
                        filterExpr: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
                        Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
                          Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
                            sort order: ++
                            Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int)
                            Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
                            value expressions: ss_store_sk (type: int)
              Map 5
                  Map Operator Tree:
                      TableScan
                        alias: b
                        filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                        Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                          Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
                            sort order: ++
                            Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int)
                            Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
              Map 6
                  Map Operator Tree:
                      TableScan
                        alias: c
                        filterExpr: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
                        Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
                          Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
                            sort order: ++
                            Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int)
                            Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
                            value expressions: ss_store_sk (type: int)
              Map 8
                  Map Operator Tree:
                      TableScan
                        alias: d
                        filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                        Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                          Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
                            sort order: ++
                            Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int)
                            Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
              Reducer 2
                  Reduce Operator Tree:
                    Merge Join Operator
                      condition map:
                           Inner Join 0 to 1
                      keys:
                        0 ss_item_sk (type: int), ss_ticket_number (type: int)
                        1 sr_item_sk (type: int), sr_ticket_number (type: int)
                      outputColumnNames: _col1, _col6, _col8, _col27, _col34
                      Filter Operator
                        predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
                        Select Operator
                          expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
                          outputColumnNames: _col0, _col1, _col2
                          Group By Operator
                            aggregations: count()
                            keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
                            mode: hash
                            outputColumnNames: _col0, _col1, _col2, _col3
                            Reduce Output Operator
                              key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int)
                              sort order: +++
                              Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int)
                              value expressions: _col3 (type: bigint)
              Reducer 4
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: count(VALUE._col0)
                      keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 (type: int)
                      mode: mergepartial
                      outputColumnNames: _col0, _col1, _col2, _col3
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
                      Filter Operator
                        predicate: (_col3 > 100000000) (type: boolean)
                        Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
                        Select Operator
                          expressions: _col3 (type: bigint)
                          outputColumnNames: _col0
                          Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
                          File Output Operator
                            compressed: false
                            Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
                            table:
                                input format: org.apache.hadoop.mapred.TextInputFormat
                                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              Reducer 7
                  Reduce Operator Tree:
                    Merge Join Operator
                      condition map:
                           Inner Join 0 to 1
                      keys:
                        0 ss_item_sk (type: int), ss_ticket_number (type: int)
                        1 sr_item_sk (type: int), sr_ticket_number (type: int)
                      outputColumnNames: _col1, _col6, _col8, _col27, _col34
                      Filter Operator
                        predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
                        Select Operator
                          expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
                          outputColumnNames: _col0, _col1, _col2
                          Group By Operator
                            aggregations: count()
                            keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
                            mode: hash
                            outputColumnNames: _col0, _col1, _col2, _col3
                            Reduce Output Operator
                              key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int)
                              sort order: +++
                              Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int)
                              value expressions: _col3 (type: bigint)
              Union 3
                  Vertex: Union 3
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

      Also TPC-DS Q54 fails with OOM, this failure happens when we chose a different plan.
      The OOM happens in vertexName=Map 14

      explain  
      with my_customers as (
       select  c_customer_sk
              , c_current_addr_sk
       from   
              ( select cs_sold_date_sk sold_date_sk,
                       cs_bill_customer_sk customer_sk,
                       cs_item_sk item_sk
                from   catalog_sales
                union all
                select ws_sold_date_sk sold_date_sk,
                       ws_bill_customer_sk customer_sk,
                       ws_item_sk item_sk
                from   web_sales
               ) cs_or_ws_sales,
               item,
               date_dim,
               customer
       where   sold_date_sk = d_date_sk
               and item_sk = i_item_sk
               and i_category = 'Jewelry'
               and i_class = 'football'
               and c_customer_sk = cs_or_ws_sales.customer_sk
               and d_moy = 3
               and d_year = 2000
               group by  c_customer_sk
              , c_current_addr_sk
       )
       , my_revenue as (
       select c_customer_sk,
              sum(ss_ext_sales_price) as revenue
       from   my_customers,
              store_sales,
              customer_address,
              store,
              date_dim
       where  c_current_addr_sk = ca_address_sk
              and ca_county = s_county
              and ca_state = s_state
              and ss_sold_date_sk = d_date_sk
              and c_customer_sk = ss_customer_sk
              and d_month_seq between (1203)
                                 and  (1205)
       group by c_customer_sk
       )
       , segments as
       (select cast((revenue/50) as int) as segment
        from   my_revenue
       )
        select  segment, count(*) as num_customers, segment*50 as segment_base
       from segments
       group by segment
       order by segment, num_customers
       limit 100
      OK
      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Map 1 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
              Map 10 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS)
              Map 12 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS)
              Map 14 <- Union 11 (BROADCAST_EDGE)
              Map 6 <- Map 7 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE)
              Map 8 <- Map 14 (BROADCAST_EDGE)
              Reducer 2 <- Map 1 (SIMPLE_EDGE)
              Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
              Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
              Reducer 9 <- Map 8 (SIMPLE_EDGE)
            DagName: mmokhtar_20150208232525_9976b56b-8f4b-48c8-a909-aa653c20051c:1
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: store_sales
                        filterExpr: ss_customer_sk is not null (type: boolean)
                        Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ss_customer_sk is not null (type: boolean)
                          Statistics: Num rows: 80566020964 Data size: 951594129356 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: ss_customer_sk (type: int), ss_ext_sales_price (type: float), ss_sold_date_sk (type: int)
                            outputColumnNames: _col0, _col1, _col2
                            Statistics: Num rows: 80566020964 Data size: 951594129356 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              keys:
                                0 _col2 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col0, _col1
                              input vertices:
                                1 Map 5
                              Statistics: Num rows: 90081226648 Data size: 720649813184 Basic stats: COMPLETE Column stats: COMPLETE
                              Map Join Operator
                                condition map:
                                     Inner Join 0 to 1
                                keys:
                                  0 _col0 (type: int)
                                  1 _col5 (type: int)
                                outputColumnNames: _col1, _col10
                                input vertices:
                                  1 Map 6
                                Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
                                Select Operator
                                  expressions: _col10 (type: int), _col1 (type: float)
                                  outputColumnNames: _col0, _col1
                                  Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
                                  Group By Operator
                                    aggregations: sum(_col1)
                                    keys: _col0 (type: int)
                                    mode: hash
                                    outputColumnNames: _col0, _col1
                                    Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
                                    Reduce Output Operator
                                      key expressions: _col0 (type: int)
                                      sort order: +
                                      Map-reduce partition columns: _col0 (type: int)
                                      Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
                                      value expressions: _col1 (type: double)
                  Execution mode: vectorized
              Map 10 
                  Map Operator Tree:
                      TableScan
                        alias: catalog_sales
                        filterExpr: (cs_item_sk is not null and cs_bill_customer_sk is not null) (type: boolean)
                        Filter Operator
                          predicate: (cs_item_sk is not null and cs_bill_customer_sk is not null) (type: boolean)
                          Select Operator
                            expressions: cs_sold_date_sk (type: int), cs_bill_customer_sk (type: int), cs_item_sk (type: int)
                            outputColumnNames: _col0, _col1, _col2
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              keys:
                                0 _col0 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col1, _col2
                              input vertices:
                                1 Map 13
                              Reduce Output Operator
                                key expressions: _col2 (type: int)
                                sort order: +
                                Map-reduce partition columns: _col2 (type: int)
                                value expressions: _col1 (type: int)
                  Execution mode: vectorized
              Map 12 
                  Map Operator Tree:
                      TableScan
                        alias: web_sales
                        filterExpr: (ws_item_sk is not null and ws_bill_customer_sk is not null) (type: boolean)
                        Filter Operator
                          predicate: (ws_item_sk is not null and ws_bill_customer_sk is not null) (type: boolean)
                          Select Operator
                            expressions: ws_sold_date_sk (type: int), ws_bill_customer_sk (type: int), ws_item_sk (type: int)
                            outputColumnNames: _col0, _col1, _col2
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              keys:
                                0 _col0 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col1, _col2
                              input vertices:
                                1 Map 13
                              Reduce Output Operator
                                key expressions: _col2 (type: int)
                                sort order: +
                                Map-reduce partition columns: _col2 (type: int)
                                value expressions: _col1 (type: int)
                  Execution mode: vectorized
              Map 13 
                  Map Operator Tree:
                      TableScan
                        alias: date_dim
                        filterExpr: (((d_moy = 3) and (d_year = 2000)) and d_date_sk is not null) (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (((d_moy = 3) and (d_year = 2000)) and d_date_sk is not null) (type: boolean)
                          Statistics: Num rows: 624 Data size: 7488 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
                            Select Operator
                              expressions: _col0 (type: int)
                              outputColumnNames: _col0
                              Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
                              Group By Operator
                                keys: _col0 (type: int)
                                mode: hash
                                outputColumnNames: _col0
                                Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
                                Dynamic Partitioning Event Operator
                                  Target Input: catalog_sales
                                  Partition key expr: cs_sold_date_sk
                                  Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
                                  Target column: cs_sold_date_sk
                                  Target Vertex: Map 10
                            Select Operator
                              expressions: _col0 (type: int)
                              outputColumnNames: _col0
                              Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
                              Group By Operator
                                keys: _col0 (type: int)
                                mode: hash
                                outputColumnNames: _col0
                                Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
                                Dynamic Partitioning Event Operator
                                  Target Input: web_sales
                                  Partition key expr: ws_sold_date_sk
                                  Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
                                  Target column: ws_sold_date_sk
                                  Target Vertex: Map 12
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 14 
                  Map Operator Tree:
                      TableScan
                        alias: item
                        filterExpr: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
                        Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
                          Statistics: Num rows: 4200 Data size: 781200 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: i_item_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 4200 Data size: 16800 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              keys:
                                0 _col2 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col1
                              input vertices:
                                0 Union 11
                              Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                              Reduce Output Operator
                                key expressions: _col1 (type: int)
                                sort order: +
                                Map-reduce partition columns: _col1 (type: int)
                                Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  Execution mode: vectorized
              Map 5 
                  Map Operator Tree:
                      TableScan
                        alias: date_dim
                        filterExpr: (d_month_seq BETWEEN 1203 AND 1205 and d_date_sk is not null) (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (d_month_seq BETWEEN 1203 AND 1205 and d_date_sk is not null) (type: boolean)
                          Statistics: Num rows: 36524 Data size: 292192 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
                            Select Operator
                              expressions: _col0 (type: int)
                              outputColumnNames: _col0
                              Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
                              Group By Operator
                                keys: _col0 (type: int)
                                mode: hash
                                outputColumnNames: _col0
                                Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE Column stats: COMPLETE
                                Dynamic Partitioning Event Operator
                                  Target Input: store_sales
                                  Partition key expr: ss_sold_date_sk
                                  Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE Column stats: COMPLETE
                                  Target column: ss_sold_date_sk
                                  Target Vertex: Map 1
                  Execution mode: vectorized
              Map 6 
                  Map Operator Tree:
                      TableScan
                        alias: customer_address
                        filterExpr: ((ca_county is not null and ca_state is not null) and ca_address_sk is not null) (type: boolean)
                        Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ((ca_county is not null and ca_state is not null) and ca_address_sk is not null) (type: boolean)
                          Statistics: Num rows: 40000000 Data size: 7520000000 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: ca_address_sk (type: int), ca_county (type: string), ca_state (type: string)
                            outputColumnNames: _col0, _col1, _col2
                            Statistics: Num rows: 40000000 Data size: 7520000000 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              keys:
                                0 _col1 (type: string), _col2 (type: string)
                                1 _col0 (type: string), _col1 (type: string)
                              outputColumnNames: _col0
                              input vertices:
                                1 Map 7
                              Statistics: Num rows: 778829 Data size: 3115316 Basic stats: COMPLETE Column stats: COMPLETE
                              Map Join Operator
                                condition map:
                                     Inner Join 0 to 1
                                keys:
                                  0 _col0 (type: int)
                                  1 _col1 (type: int)
                                outputColumnNames: _col5
                                input vertices:
                                  1 Reducer 9
                                Statistics: Num rows: 47909545988 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                Reduce Output Operator
                                  key expressions: _col5 (type: int)
                                  sort order: +
                                  Map-reduce partition columns: _col5 (type: int)
                                  Statistics: Num rows: 47909545988 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  Execution mode: vectorized
              Map 7 
                  Map Operator Tree:
                      TableScan
                        alias: store
                        filterExpr: (s_county is not null and s_state is not null) (type: boolean)
                        Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (s_county is not null and s_state is not null) (type: boolean)
                          Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: s_county (type: string), s_state (type: string)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: string), _col1 (type: string)
                              sort order: ++
                              Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
                              Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 8 
                  Map Operator Tree:
                      TableScan
                        alias: customer
                        filterExpr: (c_customer_sk is not null and c_current_addr_sk is not null) (type: boolean)
                        Statistics: Num rows: 80000000 Data size: 68801615852 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (c_customer_sk is not null and c_current_addr_sk is not null) (type: boolean)
                          Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: c_customer_sk (type: int), c_current_addr_sk (type: int)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              keys:
                                0 _col0 (type: int)
                                1 _col1 (type: int)
                              outputColumnNames: _col0, _col1
                              input vertices:
                                1 Map 14
                              Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                              Group By Operator
                                keys: _col0 (type: int), _col1 (type: int)
                                mode: hash
                                outputColumnNames: _col0, _col1
                                Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                Reduce Output Operator
                                  key expressions: _col0 (type: int), _col1 (type: int)
                                  sort order: ++
                                  Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
                                  Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  Execution mode: vectorized
              Reducer 2 
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: sum(VALUE._col0)
                      keys: KEY._col0 (type: int)
                      mode: mergepartial
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
                      Select Operator
                        expressions: UDFToInteger((_col1 / 50.0)) (type: int)
                        outputColumnNames: _col0
                        Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator
                          aggregations: count()
                          keys: _col0 (type: int)
                          mode: hash
                          outputColumnNames: _col0, _col1
                          Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
                            value expressions: _col1 (type: bigint)
              Reducer 3 
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: count(VALUE._col0)
                      keys: KEY._col0 (type: int)
                      mode: mergepartial
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: int), _col1 (type: bigint), (_col0 * 50) (type: int)
                        outputColumnNames: _col0, _col1, _col2
                        Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
                        Reduce Output Operator
                          key expressions: _col0 (type: int), _col1 (type: bigint)
                          sort order: ++
                          Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
                          TopN Hash Memory Usage: 0.04
                          value expressions: _col2 (type: int)
              Reducer 4 
                  Reduce Operator Tree:
                    Select Operator
                      expressions: KEY.reducesinkkey0 (type: int), KEY.reducesinkkey1 (type: bigint), VALUE._col0 (type: int)
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
                      Limit
                        Number of rows: 100
                        Statistics: Num rows: 100 Data size: 800 Basic stats: COMPLETE Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 100 Data size: 800 Basic stats: COMPLETE Column stats: NONE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              Reducer 9 
                  Reduce Operator Tree:
                    Group By Operator
                      keys: KEY._col0 (type: int), KEY._col1 (type: int)
                      mode: mergepartial
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 43554131773 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col1 (type: int)
                        sort order: +
                        Map-reduce partition columns: _col1 (type: int)
                        Statistics: Num rows: 43554131773 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                        value expressions: _col0 (type: int)
              Union 11 
                  Vertex: Union 11
      
        Stage: Stage-0
          Fetch Operator
            limit: 100
            Processor Tree:
              ListSink
      

      In Map 14 Data size is 0

      p 14 
                  Map Operator Tree:
                      TableScan
                        alias: item
                        filterExpr: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
                        Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
                          Statistics: Num rows: 4200 Data size: 781200 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: i_item_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 4200 Data size: 16800 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              keys:
                                0 _col2 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col1
                              input vertices:
                                0 Union 11
                              Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                              Reduce Output Operator
                                key expressions: _col1 (type: int)
                                sort order: +
                                Map-reduce partition columns: _col1 (type: int)
                                Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  Execution mode: vectorized
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pxiong Pengcheng Xiong Assign to me
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment