Details
-
Sub-task
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.14.0
-
None
Description
The estimated data size of some Select Operators is 0. BytesBytesHashMap uses data size to determine the estimated initial number of entries in the hashmap. If this data size is 0 then exception is thrown (refer below)
Query
select count(*) from store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status <> cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 and d1.d_year = 2001;
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a power of two at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:187) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:142) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a power of two at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:93) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:273) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164) ... 13 more Caused by: java.lang.AssertionError: Capacity must be a power of two at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:302) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.<init>(BytesBytesMultiHashMap.java:159) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.<init>(MapJoinBytesTableContainer.java:73) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.<init>(MapJoinBytesTableContainer.java:64) at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:145) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:201) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:236) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1035) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1039) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1039) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1039) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1039) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1039) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:85) ... 16 more
Plan
OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 11 <- Map 10 (BROADCAST_EDGE), Map 20 (BROADCAST_EDGE) Map 12 <- Map 14 (BROADCAST_EDGE) Map 16 <- Map 4 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE) Map 4 <- Map 1 (BROADCAST_EDGE), Map 15 (BROADCAST_EDGE), Map 18 (BROADCAST_EDGE), Map 19 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE), Map 3 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE), Reducer 13 (BROADCAST_EDGE) Map 5 <- Map 11 (BROADCAST_EDGE) Map 6 <- Map 21 (BROADCAST_EDGE) Reducer 13 <- Map 12 (SIMPLE_EDGE) Reducer 17 <- Map 16 (SIMPLE_EDGE) DagName: mmokhtar_20141013195656_e993c552-4b66-4bc4-8f22-3ca49c8727bb:14 Vertices: Map 1 Map Operator Tree: TableScan alias: d1 filterExpr: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 652 Data size: 5216 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 652 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Group By Operator keys: _col0 (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 652 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Dynamic Partitioning Event Operator Target Input: store_sales Partition key expr: ss_sold_date_sk Statistics: Num rows: 652 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Target column: ss_sold_date_sk Target Vertex: Map 4 Execution mode: vectorized Map 10 Map Operator Tree: TableScan alias: d1 filterExpr: d_date_sk is not null (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: d_date_sk is not null (type: boolean) Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 11 Map Operator Tree: TableScan alias: customer filterExpr: (((((c_first_sales_date_sk is not null and c_first_shipto_date_sk is not null) and c_current_cdemo_sk is not null) and c_customer_sk is not null) and c_current_addr_sk is not null) and c_current_hdemo_sk is not null) (type: boolean) Statistics: Num rows: 1600000 Data size: 1241633212 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((((c_first_sales_date_sk is not null and c_first_shipto_date_sk is not null) and c_current_cdemo_sk is not null) and c_customer_sk is not null) and c_current_addr_sk is not null) and c_current_hdemo_sk is not null) (type: boolean) Statistics: Num rows: 1387729 Data size: 32529300 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: c_customer_sk (type: int), c_current_cdemo_sk (type: int), c_current_hdemo_sk (type: int), c_current_addr_sk (type: int), c_first_shipto_date_sk (type: int), c_first_sales_date_sk (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 1387729 Data size: 32529300 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col1} {_col2} {_col3} {_col4} 1 keys: 0 _col5 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4 input vertices: 1 Map 10 Statistics: Num rows: 1551647 Data size: 31032940 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col1} {_col2} {_col3} 1 keys: 0 _col4 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col1, _col2, _col3 input vertices: 1 Map 20 Statistics: Num rows: 1734927 Data size: 27758832 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int), _col3 (type: int) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1734927 Data size: 27758832 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 1734927 Data size: 27758832 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: int), _col2 (type: int), _col3 (type: int) Execution mode: vectorized Map 12 Map Operator Tree: TableScan alias: catalog_sales filterExpr: (cs_item_sk is not null and cs_order_number is not null) (type: boolean) Statistics: Num rows: 286549727 Data size: 37743959324 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (cs_item_sk is not null and cs_order_number is not null) (type: boolean) Statistics: Num rows: 286549727 Data size: 3435718732 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: cs_item_sk (type: int), cs_order_number (type: int), cs_ext_list_price (type: float) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 286549727 Data size: 3435718732 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col2} 1 {_col2} {_col3} {_col4} keys: 0 _col0 (type: int), _col1 (type: int) 1 _col0 (type: int), _col1 (type: int) outputColumnNames: _col0, _col2, _col5, _col6, _col7 input vertices: 1 Map 14 Statistics: Num rows: 7733966 Data size: 123743456 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int), _col2 (type: float), ((_col5 + _col6) + _col7) (type: float) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 7733966 Data size: 123743456 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator aggregations: sum(_col1), sum(_col2) keys: _col0 (type: int) mode: hash outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 14754 Data size: 295080 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 14754 Data size: 295080 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: double), _col2 (type: double) Execution mode: vectorized Map 14 Map Operator Tree: TableScan alias: catalog_returns filterExpr: (cr_item_sk is not null and cr_order_number is not null) (type: boolean) Statistics: Num rows: 28798881 Data size: 2942039156 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (cr_item_sk is not null and cr_order_number is not null) (type: boolean) Statistics: Num rows: 28798881 Data size: 569059536 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: cr_item_sk (type: int), cr_order_number (type: int), cr_refunded_cash (type: float), cr_reversed_charge (type: float), cr_store_credit (type: float) outputColumnNames: _col0, _col1, _col2, _col3, _col4 Statistics: Num rows: 28798881 Data size: 569059536 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int), _col1 (type: int) Statistics: Num rows: 28798881 Data size: 569059536 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col2 (type: float), _col3 (type: float), _col4 (type: float) Execution mode: vectorized Map 15 Map Operator Tree: TableScan alias: ad1 filterExpr: ca_address_sk is not null (type: boolean) Statistics: Num rows: 800000 Data size: 811903688 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ca_address_sk is not null (type: boolean) Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ca_address_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 16 Map Operator Tree: TableScan alias: hd1 filterExpr: (hd_income_band_sk is not null and hd_demo_sk is not null) (type: boolean) Statistics: Num rows: 7200 Data size: 770400 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (hd_income_band_sk is not null and hd_demo_sk is not null) (type: boolean) Statistics: Num rows: 7200 Data size: 57600 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: hd_demo_sk (type: int), hd_income_band_sk (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 7200 Data size: 57600 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} 1 keys: 0 _col1 (type: int) 1 _col0 (type: int) outputColumnNames: _col0 input vertices: 1 Map 7 Statistics: Num rows: 8000 Data size: 32000 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 keys: 0 _col0 (type: int) 1 _col19 (type: int) input vertices: 1 Map 4 Statistics: Num rows: 90416698032652288 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator Statistics: Num rows: 90416698032652288 Data size: 0 Basic stats: PARTIAL Column stats: NONE Group By Operator aggregations: count() mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Execution mode: vectorized Map 18 Map Operator Tree: TableScan alias: promotion filterExpr: p_promo_sk is not null (type: boolean) Statistics: Num rows: 450 Data size: 530848 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: p_promo_sk is not null (type: boolean) Statistics: Num rows: 450 Data size: 1800 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: p_promo_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 450 Data size: 1800 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 450 Data size: 1800 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 19 Map Operator Tree: TableScan alias: ad1 filterExpr: ca_address_sk is not null (type: boolean) Statistics: Num rows: 800000 Data size: 811903688 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ca_address_sk is not null (type: boolean) Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ca_address_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 800000 Data size: 3200000 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: cd1 filterExpr: cd_demo_sk is not null (type: boolean) Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: cd_demo_sk is not null (type: boolean) Statistics: Num rows: 1920800 Data size: 170951200 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: cd_demo_sk (type: int), cd_marital_status (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 1920800 Data size: 170951200 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1920800 Data size: 170951200 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string) Execution mode: vectorized Map 20 Map Operator Tree: TableScan alias: d1 filterExpr: d_date_sk is not null (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: d_date_sk is not null (type: boolean) Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 21 Map Operator Tree: TableScan alias: ib1 filterExpr: ib_income_band_sk is not null (type: boolean) Statistics: Num rows: 20 Data size: 240 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ib_income_band_sk is not null (type: boolean) Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ib_income_band_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 3 Map Operator Tree: TableScan alias: item filterExpr: ((((i_color) IN ('maroon', 'burnished', 'dim', 'steel', 'navajo', 'chocolate') and i_current_price BETWEEN 35 AND 45) and i_current_price BETWEEN 36 AND 50) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((((i_color) IN ('maroon', 'burnished', 'dim', 'steel', 'navajo', 'chocolate') and i_current_price BETWEEN 35 AND 45) and i_current_price BETWEEN 36 AND 50) and i_item_sk is not null) (type: boolean) Statistics: Num rows: 6000 Data size: 581936 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: i_item_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 6000 Data size: 24000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 6000 Data size: 24000 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: store_sales filterExpr: (((((((ss_item_sk is not null and ss_store_sk is not null) and ss_cdemo_sk is not null) and ss_customer_sk is not null) and ss_ticket_number is not null) and ss_addr_sk is not null) and ss_promo_sk is not null) and ss_hdemo_sk is not null) (type: boolean) Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((((((ss_item_sk is not null and ss_store_sk is not null) and ss_cdemo_sk is not null) and ss_customer_sk is not null) and ss_ticket_number is not null) and ss_addr_sk is not null) and ss_promo_sk is not null) and ss_hdemo_sk is not null) (type: boolean) Statistics: Num rows: 476766967 Data size: 14987001212 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ss_item_sk (type: int), ss_customer_sk (type: int), ss_cdemo_sk (type: int), ss_hdemo_sk (type: int), ss_addr_sk (type: int), ss_store_sk (type: int), ss_promo_sk (type: int), ss_ticket_number (type: int), ss_sold_date_sk (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8 Statistics: Num rows: 476766967 Data size: 16894069080 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col1} {_col2} {_col3} {_col4} {_col5} {_col6} {_col7} {_col8} 1 keys: 0 _col0 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8 input vertices: 1 Map 3 Statistics: Num rows: 365759084 Data size: 13167327024 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col1} {_col2} {_col3} {_col4} {_col5} {_col6} {_col7} 1 keys: 0 _col8 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 input vertices: 1 Map 1 Statistics: Num rows: 408347470 Data size: 13067119040 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col6 (type: int), _col7 (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 Statistics: Num rows: 408347470 Data size: 13067119040 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} {_col1} {_col2} {_col3} {_col4} {_col6} {_col7} keys: 0 _col0 (type: int) 1 _col5 (type: int) outputColumnNames: _col1, _col2, _col3, _col4, _col5, _col7, _col8 input vertices: 0 Map 9 Statistics: Num rows: 1095818527 Data size: 30682918756 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col1 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col7 (type: int), _col8 (type: int) outputColumnNames: _col1, _col2, _col3, _col4, _col5, _col7, _col8 Statistics: Num rows: 1095818527 Data size: 30682918756 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col1} 1 {_col1} {_col2} {_col4} {_col5} {_col7} {_col8} keys: 0 _col0 (type: int) 1 _col3 (type: int) outputColumnNames: _col1, _col3, _col4, _col6, _col7, _col9, _col10 input vertices: 0 Map 2 Statistics: Num rows: 26284318514 Data size: 2864990718026 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col1 (type: string), _col10 (type: int), _col3 (type: int), _col4 (type: int), _col6 (type: int), _col7 (type: int), _col9 (type: int) outputColumnNames: _col1, _col10, _col3, _col4, _col6, _col7, _col9 Statistics: Num rows: 26284318514 Data size: 2864990718026 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col1} {_col4} {_col5} 1 {_col1} {_col3} {_col6} {_col7} {_col9} {_col10} keys: 0 _col2 (type: int) 1 _col4 (type: int) outputColumnNames: _col1, _col4, _col5, _col11, _col13, _col16, _col17, _col19, _col20 input vertices: 0 Map 5 Statistics: Num rows: 1259845072505 Data size: 137323112903045 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col11 <> _col1) (type: boolean) Statistics: Num rows: 1259845072505 Data size: 137323112903045 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col13 (type: int), _col16 (type: int), _col17 (type: int), _col19 (type: int), _col20 (type: int), _col4 (type: int), _col5 (type: int) outputColumnNames: _col13, _col16, _col17, _col19, _col20, _col4, _col5 Statistics: Num rows: 1259845072505 Data size: 30236281740120 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col4} {_col5} {_col13} {_col16} {_col17} {_col19} keys: 0 _col0 (type: int), _col1 (type: int) 1 _col13 (type: int), _col20 (type: int) outputColumnNames: _col6, _col7, _col15, _col18, _col19, _col21 input vertices: 0 Map 8 Statistics: Num rows: 102517810489 Data size: 2050356209780 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col15 (type: int), _col6 (type: int), _col7 (type: int), _col18 (type: int), _col19 (type: int), _col21 (type: int) outputColumnNames: _col0, _col13, _col14, _col3, _col4, _col6 Statistics: Num rows: 102517810489 Data size: 2050356209780 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col0} {_col3} {_col6} {_col13} {_col14} keys: 0 _col0 (type: int) 1 _col4 (type: int) outputColumnNames: _col1, _col4, _col7, _col14, _col15 input vertices: 0 Map 15 Statistics: Num rows: 13141203075020 Data size: 210259249200320 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col1 (type: int), _col14 (type: int), _col15 (type: int), _col4 (type: int), _col7 (type: int) outputColumnNames: _col1, _col14, _col15, _col4, _col7 Statistics: Num rows: 13141203075020 Data size: 210259249200320 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col1} {_col4} {_col7} {_col14} keys: 0 _col0 (type: int) 1 _col15 (type: int) outputColumnNames: _col2, _col5, _col8, _col15 input vertices: 0 Map 19 Statistics: Num rows: 239649914744597 Data size: 2875798976935164 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col5} {_col8} {_col15} 1 keys: 0 _col2 (type: int) 1 _col0 (type: int) outputColumnNames: _col5, _col8, _col15 input vertices: 1 Reducer 13 Statistics: Num rows: 239649914744597 Data size: 1917199317956776 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col15 (type: int), _col5 (type: int), _col8 (type: int) outputColumnNames: _col15, _col5, _col8 Statistics: Num rows: 239649914744597 Data size: 1917199317956776 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col5} {_col15} keys: 0 _col0 (type: int) 1 _col8 (type: int) outputColumnNames: _col6, _col16 input vertices: 0 Map 18 Statistics: Num rows: 6740153852191791 Data size: 26960615408767164 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col16 (type: int), _col6 (type: int) outputColumnNames: _col16, _col6 Statistics: Num rows: 6740153852191791 Data size: 26960615408767164 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 {_col16} keys: 0 _col0 (type: int) 1 _col6 (type: int) outputColumnNames: _col19 input vertices: 0 Map 6 Statistics: Num rows: 82196998197460864 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Select Operator expressions: _col19 (type: int) outputColumnNames: _col19 Statistics: Num rows: 82196998197460864 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Reduce Output Operator key expressions: _col19 (type: int) sort order: + Map-reduce partition columns: _col19 (type: int) Statistics: Num rows: 82196998197460864 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Execution mode: vectorized Map 5 Map Operator Tree: TableScan alias: cd1 filterExpr: cd_demo_sk is not null (type: boolean) Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: cd_demo_sk is not null (type: boolean) Statistics: Num rows: 1920800 Data size: 170951200 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: cd_demo_sk (type: int), cd_marital_status (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 1920800 Data size: 170951200 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col1} 1 {_col0} {_col2} {_col3} keys: 0 _col0 (type: int) 1 _col1 (type: int) outputColumnNames: _col1, _col2, _col4, _col5 input vertices: 1 Map 11 Statistics: Num rows: 3675622 Data size: 44107464 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col2 (type: int) sort order: + Map-reduce partition columns: _col2 (type: int) Statistics: Num rows: 3675622 Data size: 44107464 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string), _col4 (type: int), _col5 (type: int) Execution mode: vectorized Map 6 Map Operator Tree: TableScan alias: hd1 filterExpr: (hd_income_band_sk is not null and hd_demo_sk is not null) (type: boolean) Statistics: Num rows: 7200 Data size: 770400 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (hd_income_band_sk is not null and hd_demo_sk is not null) (type: boolean) Statistics: Num rows: 7200 Data size: 57600 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: hd_demo_sk (type: int), hd_income_band_sk (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 7200 Data size: 57600 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} 1 keys: 0 _col1 (type: int) 1 _col0 (type: int) outputColumnNames: _col0 input vertices: 1 Map 21 Statistics: Num rows: 8000 Data size: 32000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 8000 Data size: 32000 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 7 Map Operator Tree: TableScan alias: ib1 filterExpr: ib_income_band_sk is not null (type: boolean) Statistics: Num rows: 20 Data size: 240 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ib_income_band_sk is not null (type: boolean) Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ib_income_band_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 8 Map Operator Tree: TableScan alias: store_returns filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: sr_item_sk (type: int), sr_ticket_number (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int), _col1 (type: int) Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 9 Map Operator Tree: TableScan alias: store filterExpr: s_store_sk is not null (type: boolean) Statistics: Num rows: 212 Data size: 405680 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: s_store_sk is not null (type: boolean) Statistics: Num rows: 212 Data size: 848 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: s_store_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 212 Data size: 848 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 212 Data size: 848 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Reducer 13 Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0), sum(VALUE._col1) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 14754 Data size: 354096 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col1 > (UDFToDouble(2) * _col2)) (type: boolean) Statistics: Num rows: 4918 Data size: 118032 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 4918 Data size: 39344 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 4918 Data size: 39344 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Reducer 17 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Execution mode: vectorized Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 12.6 seconds, Fetched: 738 row(s)
Looks like an overflow is happening and key count gets set to Integer.MAX_VALUE then nextHighestPowerOfTwo overflows to Integer.MIN_VALUE
2014-10-13 23:18:08,215 INFO [TezChild] org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper: Key count from statistics is 82196998197460864; setting map size to 2147483647 2014-10-13 23:18:08,215 INFO [TezChild] org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: initialCapacity in :2147483647 2014-10-13 23:18:08,215 INFO [TezChild] org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap: initialCapacity out :-2147483648
Attachments
Attachments
Issue Links
- blocks
-
HIVE-8580 Support LateralViewJoinOperator and LateralViewForwardOperator in stats annotation
- Closed
- links to