Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5369 Annotate hive operator tree with statistics from metastore
  3. HIVE-8454

Select Operator does not rename column stats properly in case of select star

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.14.0
    • 0.14.0
    • Physical Optimizer
    • None

    Description

      The estimated data size of some Select Operators is 0. BytesBytesHashMap uses data size to determine the estimated initial number of entries in the hashmap. If this data size is 0 then exception is thrown (refer below)
      Query

      select count(*) from
       store_sales
              JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number
              JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
              JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
              JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
              JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
              JOIN store ON store_sales.ss_store_sk = store.s_store_sk
      		JOIN item ON store_sales.ss_item_sk = item.i_item_sk
           	JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk
              JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk
              JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
              JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk
              JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk
              JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk
              JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk
              JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
              JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
              JOIN
       (select cs_item_sk
              ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
        from catalog_sales JOIN catalog_returns
        ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
          and catalog_sales.cs_order_number = catalog_returns.cr_order_number
        group by cs_item_sk
        having sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui
      ON store_sales.ss_item_sk = cs_ui.cs_item_sk
        WHERE  
               cd1.cd_marital_status <> cd2.cd_marital_status and
               i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and
               i_current_price between 35 and 35 + 10 and
               i_current_price between 35 + 1 and 35 + 15
      	 and d1.d_year = 2001;
      
      ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a power of two
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:187)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:142)
      	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
      	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      Caused by: java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a power of two
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:93)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:273)
      	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
      	... 13 more
      Caused by: java.lang.AssertionError: Capacity must be a power of two
      	at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:302)
      	at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.<init>(BytesBytesMultiHashMap.java:159)
      	at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.<init>(MapJoinBytesTableContainer.java:73)
      	at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.<init>(MapJoinBytesTableContainer.java:64)
      	at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:145)
      	at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:201)
      	at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:236)
      	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1035)
      	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1039)
      	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1039)
      	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1039)
      	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1039)
      	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1039)
      	at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37)
      	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:85)
      	... 16 more
      

      Plan

      OK
      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            Edges:
              Map 11 <- Map 10 (BROADCAST_EDGE), Map 20 (BROADCAST_EDGE)
              Map 12 <- Map 14 (BROADCAST_EDGE)
              Map 16 <- Map 4 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE)
              Map 4 <- Map 1 (BROADCAST_EDGE), Map 15 (BROADCAST_EDGE), Map 18 (BROADCAST_EDGE), Map 19 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE), Map 3 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE), Reducer 13 (BROADCAST_EDGE)
              Map 5 <- Map 11 (BROADCAST_EDGE)
              Map 6 <- Map 21 (BROADCAST_EDGE)
              Reducer 13 <- Map 12 (SIMPLE_EDGE)
              Reducer 17 <- Map 16 (SIMPLE_EDGE)
            DagName: mmokhtar_20141013195656_e993c552-4b66-4bc4-8f22-3ca49c8727bb:14
            Vertices:
              Map 1
                  Map Operator Tree:
                      TableScan
                        alias: d1
                        filterExpr: ((d_year = 2001) and d_date_sk is not null) (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ((d_year = 2001) and d_date_sk is not null) (type: boolean)
                          Statistics: Num rows: 652 Data size: 5216 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE
                            Select Operator
                              expressions: _col0 (type: int)
                              outputColumnNames: _col0
                              Statistics: Num rows: 652 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                              Group By Operator
                                keys: _col0 (type: int)
                                mode: hash
                                outputColumnNames: _col0
                                Statistics: Num rows: 652 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                Dynamic Partitioning Event Operator
                                  Target Input: store_sales
                                  Partition key expr: ss_sold_date_sk
                                  Statistics: Num rows: 652 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                  Target column: ss_sold_date_sk
                                  Target Vertex: Map 4
                  Execution mode: vectorized
              Map 10
                  Map Operator Tree:
                      TableScan
                        alias: d1
                        filterExpr: d_date_sk is not null (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: d_date_sk is not null (type: boolean)
                          Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 11
                  Map Operator Tree:
                      TableScan
                        alias: customer
                        filterExpr: (((((c_first_sales_date_sk is not null and c_first_shipto_date_sk is not null) and c_current_cdemo_sk is not null) and c_customer_sk is not null) and c_current_addr_sk is not null) and c_current_hdemo_sk is not null) (type: boolean)
                        Statistics: Num rows: 1600000 Data size: 1241633212 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (((((c_first_sales_date_sk is not null and c_first_shipto_date_sk is not null) and c_current_cdemo_sk is not null) and c_customer_sk is not null) and c_current_addr_sk is not null) and c_current_hdemo_sk is not null) (type: boolean)
                          Statistics: Num rows: 1387729 Data size: 32529300 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: c_customer_sk (type: int), c_current_cdemo_sk (type: int), c_current_hdemo_sk (type: int), c_current_addr_sk (type: int), c_first_shipto_date_sk (type: int), c_first_sales_date_sk (type: int)
                            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                            Statistics: Num rows: 1387729 Data size: 32529300 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col0} {_col1} {_col2} {_col3} {_col4}
                                1
                              keys:
                                0 _col5 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col0, _col1, _col2, _col3, _col4
                              input vertices:
                                1 Map 10
                              Statistics: Num rows: 1551647 Data size: 31032940 Basic stats: COMPLETE Column stats: COMPLETE
                              Map Join Operator
                                condition map:
                                     Inner Join 0 to 1
                                condition expressions:
                                  0 {_col0} {_col1} {_col2} {_col3}
                                  1
                                keys:
                                  0 _col4 (type: int)
                                  1 _col0 (type: int)
                                outputColumnNames: _col0, _col1, _col2, _col3
                                input vertices:
                                  1 Map 20
                                Statistics: Num rows: 1734927 Data size: 27758832 Basic stats: COMPLETE Column stats: COMPLETE
                                Select Operator
                                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int), _col3 (type: int)
                                  outputColumnNames: _col0, _col1, _col2, _col3
                                  Statistics: Num rows: 1734927 Data size: 27758832 Basic stats: COMPLETE Column stats: COMPLETE
                                  Reduce Output Operator
                                    key expressions: _col1 (type: int)
                                    sort order: +
                                    Map-reduce partition columns: _col1 (type: int)
                                    Statistics: Num rows: 1734927 Data size: 27758832 Basic stats: COMPLETE Column stats: COMPLETE
                                    value expressions: _col0 (type: int), _col2 (type: int), _col3 (type: int)
                  Execution mode: vectorized
              Map 12
                  Map Operator Tree:
                      TableScan
                        alias: catalog_sales
                        filterExpr: (cs_item_sk is not null and cs_order_number is not null) (type: boolean)
                        Statistics: Num rows: 286549727 Data size: 37743959324 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (cs_item_sk is not null and cs_order_number is not null) (type: boolean)
                          Statistics: Num rows: 286549727 Data size: 3435718732 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: cs_item_sk (type: int), cs_order_number (type: int), cs_ext_list_price (type: float)
                            outputColumnNames: _col0, _col1, _col2
                            Statistics: Num rows: 286549727 Data size: 3435718732 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col0} {_col2}
                                1 {_col2} {_col3} {_col4}
                              keys:
                                0 _col0 (type: int), _col1 (type: int)
                                1 _col0 (type: int), _col1 (type: int)
                              outputColumnNames: _col0, _col2, _col5, _col6, _col7
                              input vertices:
                                1 Map 14
                              Statistics: Num rows: 7733966 Data size: 123743456 Basic stats: COMPLETE Column stats: COMPLETE
                              Select Operator
                                expressions: _col0 (type: int), _col2 (type: float), ((_col5 + _col6) + _col7) (type: float)
                                outputColumnNames: _col0, _col1, _col2
                                Statistics: Num rows: 7733966 Data size: 123743456 Basic stats: COMPLETE Column stats: COMPLETE
                                Group By Operator
                                  aggregations: sum(_col1), sum(_col2)
                                  keys: _col0 (type: int)
                                  mode: hash
                                  outputColumnNames: _col0, _col1, _col2
                                  Statistics: Num rows: 14754 Data size: 295080 Basic stats: COMPLETE Column stats: COMPLETE
                                  Reduce Output Operator
                                    key expressions: _col0 (type: int)
                                    sort order: +
                                    Map-reduce partition columns: _col0 (type: int)
                                    Statistics: Num rows: 14754 Data size: 295080 Basic stats: COMPLETE Column stats: COMPLETE
                                    value expressions: _col1 (type: double), _col2 (type: double)
                  Execution mode: vectorized
              Map 14
                  Map Operator Tree:
                      TableScan
                        alias: catalog_returns
                        filterExpr: (cr_item_sk is not null and cr_order_number is not null) (type: boolean)
                        Statistics: Num rows: 28798881 Data size: 2942039156 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (cr_item_sk is not null and cr_order_number is not null) (type: boolean)
                          Statistics: Num rows: 28798881 Data size: 569059536 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: cr_item_sk (type: int), cr_order_number (type: int), cr_refunded_cash (type: float), cr_reversed_charge (type: float), cr_store_credit (type: float)
                            outputColumnNames: _col0, _col1, _col2, _col3, _col4
                            Statistics: Num rows: 28798881 Data size: 569059536 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int), _col1 (type: int)
                              sort order: ++
                              Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
                              Statistics: Num rows: 28798881 Data size: 569059536 Basic stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col2 (type: float), _col3 (type: float), _col4 (type: float)
                  Execution mode: vectorized
              Map 15
                  Map Operator Tree:
                      TableScan
                        alias: ad1
                        filterExpr: ca_address_sk is not null (type: boolean)
                        Statistics: Num rows: 800000 Data size: 811903688 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ca_address_sk is not null (type: boolean)
                          Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: ca_address_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 16
                  Map Operator Tree:
                      TableScan
                        alias: hd1
                        filterExpr: (hd_income_band_sk is not null and hd_demo_sk is not null) (type: boolean)
                        Statistics: Num rows: 7200 Data size: 770400 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (hd_income_band_sk is not null and hd_demo_sk is not null) (type: boolean)
                          Statistics: Num rows: 7200 Data size: 57600 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: hd_demo_sk (type: int), hd_income_band_sk (type: int)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 7200 Data size: 57600 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col0}
                                1
                              keys:
                                0 _col1 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col0
                              input vertices:
                                1 Map 7
                              Statistics: Num rows: 8000 Data size: 32000 Basic stats: COMPLETE Column stats: COMPLETE
                              Map Join Operator
                                condition map:
                                     Inner Join 0 to 1
                                condition expressions:
                                  0
                                  1
                                keys:
                                  0 _col0 (type: int)
                                  1 _col19 (type: int)
                                input vertices:
                                  1 Map 4
                                Statistics: Num rows: 90416698032652288 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                Select Operator
                                  Statistics: Num rows: 90416698032652288 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                  Group By Operator
                                    aggregations: count()
                                    mode: hash
                                    outputColumnNames: _col0
                                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                                    Reduce Output Operator
                                      sort order:
                                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                                      value expressions: _col0 (type: bigint)
                  Execution mode: vectorized
              Map 18
                  Map Operator Tree:
                      TableScan
                        alias: promotion
                        filterExpr: p_promo_sk is not null (type: boolean)
                        Statistics: Num rows: 450 Data size: 530848 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: p_promo_sk is not null (type: boolean)
                          Statistics: Num rows: 450 Data size: 1800 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: p_promo_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 450 Data size: 1800 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 450 Data size: 1800 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 19
                  Map Operator Tree:
                      TableScan
                        alias: ad1
                        filterExpr: ca_address_sk is not null (type: boolean)
                        Statistics: Num rows: 800000 Data size: 811903688 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ca_address_sk is not null (type: boolean)
                          Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: ca_address_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 2
                  Map Operator Tree:
                      TableScan
                        alias: cd1
                        filterExpr: cd_demo_sk is not null (type: boolean)
                        Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: cd_demo_sk is not null (type: boolean)
                          Statistics: Num rows: 1920800 Data size: 170951200 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: cd_demo_sk (type: int), cd_marital_status (type: string)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 1920800 Data size: 170951200 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 1920800 Data size: 170951200 Basic stats: COMPLETE Column stats: COMPLETE
                              value expressions: _col1 (type: string)
                  Execution mode: vectorized
              Map 20
                  Map Operator Tree:
                      TableScan
                        alias: d1
                        filterExpr: d_date_sk is not null (type: boolean)
                        Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: d_date_sk is not null (type: boolean)
                          Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: d_date_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 21
                  Map Operator Tree:
                      TableScan
                        alias: ib1
                        filterExpr: ib_income_band_sk is not null (type: boolean)
                        Statistics: Num rows: 20 Data size: 240 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ib_income_band_sk is not null (type: boolean)
                          Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: ib_income_band_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 3
                  Map Operator Tree:
                      TableScan
                        alias: item
                        filterExpr: ((((i_color) IN ('maroon', 'burnished', 'dim', 'steel', 'navajo', 'chocolate') and i_current_price BETWEEN 35 AND 45) and i_current_price BETWEEN 36 AND 50) and i_item_sk is not null) (type: boolean)
                        Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ((((i_color) IN ('maroon', 'burnished', 'dim', 'steel', 'navajo', 'chocolate') and i_current_price BETWEEN 35 AND 45) and i_current_price BETWEEN 36 AND 50) and i_item_sk is not null) (type: boolean)
                          Statistics: Num rows: 6000 Data size: 581936 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: i_item_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 6000 Data size: 24000 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 6000 Data size: 24000 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 4
                  Map Operator Tree:
                      TableScan
                        alias: store_sales
                        filterExpr: (((((((ss_item_sk is not null and ss_store_sk is not null) and ss_cdemo_sk is not null) and ss_customer_sk is not null) and ss_ticket_number is not null) and ss_addr_sk is not null) and ss_promo_sk is not null) and ss_hdemo_sk is not null) (type: boolean)
                        Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (((((((ss_item_sk is not null and ss_store_sk is not null) and ss_cdemo_sk is not null) and ss_customer_sk is not null) and ss_ticket_number is not null) and ss_addr_sk is not null) and ss_promo_sk is not null) and ss_hdemo_sk is not null) (type: boolean)
                          Statistics: Num rows: 476766967 Data size: 14987001212 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: ss_item_sk (type: int), ss_customer_sk (type: int), ss_cdemo_sk (type: int), ss_hdemo_sk (type: int), ss_addr_sk (type: int), ss_store_sk (type: int), ss_promo_sk (type: int), ss_ticket_number (type: int), ss_sold_date_sk (type: int)
                            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                            Statistics: Num rows: 476766967 Data size: 16894069080 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col0} {_col1} {_col2} {_col3} {_col4} {_col5} {_col6} {_col7} {_col8}
                                1
                              keys:
                                0 _col0 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
                              input vertices:
                                1 Map 3
                              Statistics: Num rows: 365759084 Data size: 13167327024 Basic stats: COMPLETE Column stats: COMPLETE
                              Map Join Operator
                                condition map:
                                     Inner Join 0 to 1
                                condition expressions:
                                  0 {_col0} {_col1} {_col2} {_col3} {_col4} {_col5} {_col6} {_col7}
                                  1
                                keys:
                                  0 _col8 (type: int)
                                  1 _col0 (type: int)
                                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                                input vertices:
                                  1 Map 1
                                Statistics: Num rows: 408347470 Data size: 13067119040 Basic stats: COMPLETE Column stats: COMPLETE
                                Select Operator
                                  expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col6 (type: int), _col7 (type: int)
                                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
                                  Statistics: Num rows: 408347470 Data size: 13067119040 Basic stats: COMPLETE Column stats: COMPLETE
                                  Map Join Operator
                                    condition map:
                                         Inner Join 0 to 1
                                    condition expressions:
                                      0
                                      1 {_col0} {_col1} {_col2} {_col3} {_col4} {_col6} {_col7}
                                    keys:
                                      0 _col0 (type: int)
                                      1 _col5 (type: int)
                                    outputColumnNames: _col1, _col2, _col3, _col4, _col5, _col7, _col8
                                    input vertices:
                                      0 Map 9
                                    Statistics: Num rows: 1095818527 Data size: 30682918756 Basic stats: COMPLETE Column stats: COMPLETE
                                    Select Operator
                                      expressions: _col1 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col7 (type: int), _col8 (type: int)
                                      outputColumnNames: _col1, _col2, _col3, _col4, _col5, _col7, _col8
                                      Statistics: Num rows: 1095818527 Data size: 30682918756 Basic stats: COMPLETE Column stats: COMPLETE
                                      Map Join Operator
                                        condition map:
                                             Inner Join 0 to 1
                                        condition expressions:
                                          0 {_col1}
                                          1 {_col1} {_col2} {_col4} {_col5} {_col7} {_col8}
                                        keys:
                                          0 _col0 (type: int)
                                          1 _col3 (type: int)
                                        outputColumnNames: _col1, _col3, _col4, _col6, _col7, _col9, _col10
                                        input vertices:
                                          0 Map 2
                                        Statistics: Num rows: 26284318514 Data size: 2864990718026 Basic stats: COMPLETE Column stats: COMPLETE
                                        Select Operator
                                          expressions: _col1 (type: string), _col10 (type: int), _col3 (type: int), _col4 (type: int), _col6 (type: int), _col7 (type: int), _col9 (type: int)
                                          outputColumnNames: _col1, _col10, _col3, _col4, _col6, _col7, _col9
                                          Statistics: Num rows: 26284318514 Data size: 2864990718026 Basic stats: COMPLETE Column stats: COMPLETE
                                          Map Join Operator
                                            condition map:
                                                 Inner Join 0 to 1
                                            condition expressions:
                                              0 {_col1} {_col4} {_col5}
                                              1 {_col1} {_col3} {_col6} {_col7} {_col9} {_col10}
                                            keys:
                                              0 _col2 (type: int)
                                              1 _col4 (type: int)
                                            outputColumnNames: _col1, _col4, _col5, _col11, _col13, _col16, _col17, _col19, _col20
                                            input vertices:
                                              0 Map 5
                                            Statistics: Num rows: 1259845072505 Data size: 137323112903045 Basic stats: COMPLETE Column stats: COMPLETE
                                            Filter Operator
                                              predicate: (_col11 <> _col1) (type: boolean)
                                              Statistics: Num rows: 1259845072505 Data size: 137323112903045 Basic stats: COMPLETE Column stats: COMPLETE
                                              Select Operator
                                                expressions: _col13 (type: int), _col16 (type: int), _col17 (type: int), _col19 (type: int), _col20 (type: int), _col4 (type: int), _col5 (type: int)
                                                outputColumnNames: _col13, _col16, _col17, _col19, _col20, _col4, _col5
                                                Statistics: Num rows: 1259845072505 Data size: 30236281740120 Basic stats: COMPLETE Column stats: COMPLETE
                                                Map Join Operator
                                                  condition map:
                                                       Inner Join 0 to 1
                                                  condition expressions:
                                                    0
                                                    1 {_col4} {_col5} {_col13} {_col16} {_col17} {_col19}
                                                  keys:
                                                    0 _col0 (type: int), _col1 (type: int)
                                                    1 _col13 (type: int), _col20 (type: int)
                                                  outputColumnNames: _col6, _col7, _col15, _col18, _col19, _col21
                                                  input vertices:
                                                    0 Map 8
                                                  Statistics: Num rows: 102517810489 Data size: 2050356209780 Basic stats: COMPLETE Column stats: COMPLETE
                                                  Select Operator
                                                    expressions: _col15 (type: int), _col6 (type: int), _col7 (type: int), _col18 (type: int), _col19 (type: int), _col21 (type: int)
                                                    outputColumnNames: _col0, _col13, _col14, _col3, _col4, _col6
                                                    Statistics: Num rows: 102517810489 Data size: 2050356209780 Basic stats: COMPLETE Column stats: COMPLETE
                                                    Map Join Operator
                                                      condition map:
                                                           Inner Join 0 to 1
                                                      condition expressions:
                                                        0
                                                        1 {_col0} {_col3} {_col6} {_col13} {_col14}
                                                      keys:
                                                        0 _col0 (type: int)
                                                        1 _col4 (type: int)
                                                      outputColumnNames: _col1, _col4, _col7, _col14, _col15
                                                      input vertices:
                                                        0 Map 15
                                                      Statistics: Num rows: 13141203075020 Data size: 210259249200320 Basic stats: COMPLETE Column stats: COMPLETE
                                                      Select Operator
                                                        expressions: _col1 (type: int), _col14 (type: int), _col15 (type: int), _col4 (type: int), _col7 (type: int)
                                                        outputColumnNames: _col1, _col14, _col15, _col4, _col7
                                                        Statistics: Num rows: 13141203075020 Data size: 210259249200320 Basic stats: COMPLETE Column stats: COMPLETE
                                                        Map Join Operator
                                                          condition map:
                                                               Inner Join 0 to 1
                                                          condition expressions:
                                                            0
                                                            1 {_col1} {_col4} {_col7} {_col14}
                                                          keys:
                                                            0 _col0 (type: int)
                                                            1 _col15 (type: int)
                                                          outputColumnNames: _col2, _col5, _col8, _col15
                                                          input vertices:
                                                            0 Map 19
                                                          Statistics: Num rows: 239649914744597 Data size: 2875798976935164 Basic stats: COMPLETE Column stats: COMPLETE
                                                          Map Join Operator
                                                            condition map:
                                                                 Inner Join 0 to 1
                                                            condition expressions:
                                                              0 {_col5} {_col8} {_col15}
                                                              1
                                                            keys:
                                                              0 _col2 (type: int)
                                                              1 _col0 (type: int)
                                                            outputColumnNames: _col5, _col8, _col15
                                                            input vertices:
                                                              1 Reducer 13
                                                            Statistics: Num rows: 239649914744597 Data size: 1917199317956776 Basic stats: COMPLETE Column stats: COMPLETE
                                                            Select Operator
                                                              expressions: _col15 (type: int), _col5 (type: int), _col8 (type: int)
                                                              outputColumnNames: _col15, _col5, _col8
                                                              Statistics: Num rows: 239649914744597 Data size: 1917199317956776 Basic stats: COMPLETE Column stats: COMPLETE
                                                              Map Join Operator
                                                                condition map:
                                                                     Inner Join 0 to 1
                                                                condition expressions:
                                                                  0
                                                                  1 {_col5} {_col15}
                                                                keys:
                                                                  0 _col0 (type: int)
                                                                  1 _col8 (type: int)
                                                                outputColumnNames: _col6, _col16
                                                                input vertices:
                                                                  0 Map 18
                                                                Statistics: Num rows: 6740153852191791 Data size: 26960615408767164 Basic stats: COMPLETE Column stats: COMPLETE
                                                                Select Operator
                                                                  expressions: _col16 (type: int), _col6 (type: int)
                                                                  outputColumnNames: _col16, _col6
                                                                  Statistics: Num rows: 6740153852191791 Data size: 26960615408767164 Basic stats: COMPLETE Column stats: COMPLETE
                                                                  Map Join Operator
                                                                    condition map:
                                                                         Inner Join 0 to 1
                                                                    condition expressions:
                                                                      0
                                                                      1 {_col16}
                                                                    keys:
                                                                      0 _col0 (type: int)
                                                                      1 _col6 (type: int)
                                                                    outputColumnNames: _col19
                                                                    input vertices:
                                                                      0 Map 6
                                                                    Statistics: Num rows: 82196998197460864 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                                                    Select Operator
                                                                      expressions: _col19 (type: int)
                                                                      outputColumnNames: _col19
                                                                      Statistics: Num rows: 82196998197460864 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                                                                      Reduce Output Operator
                                                                        key expressions: _col19 (type: int)
                                                                        sort order: +
                                                                        Map-reduce partition columns: _col19 (type: int)
                                                                        Statistics: Num rows: 82196998197460864 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Execution mode: vectorized
              Map 5
                  Map Operator Tree:
                      TableScan
                        alias: cd1
                        filterExpr: cd_demo_sk is not null (type: boolean)
                        Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: cd_demo_sk is not null (type: boolean)
                          Statistics: Num rows: 1920800 Data size: 170951200 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: cd_demo_sk (type: int), cd_marital_status (type: string)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 1920800 Data size: 170951200 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col1}
                                1 {_col0} {_col2} {_col3}
                              keys:
                                0 _col0 (type: int)
                                1 _col1 (type: int)
                              outputColumnNames: _col1, _col2, _col4, _col5
                              input vertices:
                                1 Map 11
                              Statistics: Num rows: 3675622 Data size: 44107464 Basic stats: COMPLETE Column stats: COMPLETE
                              Reduce Output Operator
                                key expressions: _col2 (type: int)
                                sort order: +
                                Map-reduce partition columns: _col2 (type: int)
                                Statistics: Num rows: 3675622 Data size: 44107464 Basic stats: COMPLETE Column stats: COMPLETE
                                value expressions: _col1 (type: string), _col4 (type: int), _col5 (type: int)
                  Execution mode: vectorized
              Map 6
                  Map Operator Tree:
                      TableScan
                        alias: hd1
                        filterExpr: (hd_income_band_sk is not null and hd_demo_sk is not null) (type: boolean)
                        Statistics: Num rows: 7200 Data size: 770400 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (hd_income_band_sk is not null and hd_demo_sk is not null) (type: boolean)
                          Statistics: Num rows: 7200 Data size: 57600 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: hd_demo_sk (type: int), hd_income_band_sk (type: int)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 7200 Data size: 57600 Basic stats: COMPLETE Column stats: COMPLETE
                            Map Join Operator
                              condition map:
                                   Inner Join 0 to 1
                              condition expressions:
                                0 {_col0}
                                1
                              keys:
                                0 _col1 (type: int)
                                1 _col0 (type: int)
                              outputColumnNames: _col0
                              input vertices:
                                1 Map 21
                              Statistics: Num rows: 8000 Data size: 32000 Basic stats: COMPLETE Column stats: COMPLETE
                              Reduce Output Operator
                                key expressions: _col0 (type: int)
                                sort order: +
                                Map-reduce partition columns: _col0 (type: int)
                                Statistics: Num rows: 8000 Data size: 32000 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 7
                  Map Operator Tree:
                      TableScan
                        alias: ib1
                        filterExpr: ib_income_band_sk is not null (type: boolean)
                        Statistics: Num rows: 20 Data size: 240 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: ib_income_band_sk is not null (type: boolean)
                          Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: ib_income_band_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 8
                  Map Operator Tree:
                      TableScan
                        alias: store_returns
                        filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                        Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
                          Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int), _col1 (type: int)
                              sort order: ++
                              Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
                              Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Map 9
                  Map Operator Tree:
                      TableScan
                        alias: store
                        filterExpr: s_store_sk is not null (type: boolean)
                        Statistics: Num rows: 212 Data size: 405680 Basic stats: COMPLETE Column stats: COMPLETE
                        Filter Operator
                          predicate: s_store_sk is not null (type: boolean)
                          Statistics: Num rows: 212 Data size: 848 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: s_store_sk (type: int)
                            outputColumnNames: _col0
                            Statistics: Num rows: 212 Data size: 848 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Map-reduce partition columns: _col0 (type: int)
                              Statistics: Num rows: 212 Data size: 848 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Reducer 13
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: sum(VALUE._col0), sum(VALUE._col1)
                      keys: KEY._col0 (type: int)
                      mode: mergepartial
                      outputColumnNames: _col0, _col1, _col2
                      Statistics: Num rows: 14754 Data size: 354096 Basic stats: COMPLETE Column stats: COMPLETE
                      Filter Operator
                        predicate: (_col1 > (UDFToDouble(2) * _col2)) (type: boolean)
                        Statistics: Num rows: 4918 Data size: 118032 Basic stats: COMPLETE Column stats: COMPLETE
                        Select Operator
                          expressions: _col0 (type: int)
                          outputColumnNames: _col0
                          Statistics: Num rows: 4918 Data size: 39344 Basic stats: COMPLETE Column stats: COMPLETE
                          Reduce Output Operator
                            key expressions: _col0 (type: int)
                            sort order: +
                            Map-reduce partition columns: _col0 (type: int)
                            Statistics: Num rows: 4918 Data size: 39344 Basic stats: COMPLETE Column stats: COMPLETE
                  Execution mode: vectorized
              Reducer 17
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: count(VALUE._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: bigint)
                        outputColumnNames: _col0
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                  Execution mode: vectorized
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      
      Time taken: 12.6 seconds, Fetched: 738 row(s)
      

      Looks like an overflow is happening and key count gets set to Integer.MAX_VALUE then nextHighestPowerOfTwo overflows to Integer.MIN_VALUE

      2014-10-13 23:18:08,215 INFO [TezChild] org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper: Key count from statistics is 82196998197460864; setting map size to 2147483647
      2014-10-13 23:18:08,215 INFO [TezChild] org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: initialCapacity in :2147483647
      2014-10-13 23:18:08,215 INFO [TezChild] org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: initialCapacity out :-2147483648
      

      Attachments

        1. HIVE-8454.7.patch
          73 kB
          Prasanth Jayachandran
        2. HIVE-8454.6.patch
          68 kB
          Prasanth Jayachandran
        3. HIVE-8454.5.patch
          67 kB
          Prasanth Jayachandran
        4. HIVE-8454.4.patch
          25 kB
          Prasanth Jayachandran
        5. HIVE-8454.3.patch
          21 kB
          Prasanth Jayachandran
        6. HIVE-8454.3.patch
          21 kB
          Gunther Hagleitner
        7. HIVE-8454.2.patch
          9 kB
          Prasanth Jayachandran
        8. HIVE-8454.1.patch
          28 kB
          Prasanth Jayachandran

        Issue Links

          Activity

            People

              prasanth_j Prasanth Jayachandran
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: