Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.3.0, Impala 2.5.0
Description
Thanks to sobrien05 from the user list for reporting this!
Original post:
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Predict-push-down-bug/m-p/38516
Repro tables and data:
create table orders(order_id int, address_id int); insert into table orders values((1, 10), (2, 20), (3, 30)); create table addresses(address_id int, state string); insert into table addresses values((10, "CA"), (20, "NY"), (30, "CA"));
Repro query and bad plan:
Select * FROM ( Select shipping_addr.state, count(order_id) as num_orders FROM ( Select order_id, address_id FROM orders ) order_data LEFT OUTER JOIN [shuffle] ( Select address_id, state FROM addresses ) shipping_addr ON order_data.address_id = shipping_addr.address_id GROUP BY shipping_addr.state ) unf WHERE lower(state) = lower('NY') +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=2.16GB VCores=2 | | WARNING: The following tables are missing relevant table and/or column statistics. | | debug.addresses, debug.orders | | | | 08:EXCHANGE [UNPARTITIONED] | | | | | 07:AGGREGATE [FINALIZE] | | | output: count:merge(order_id) | | | group by: shipping_addr.state | | | | | 06:EXCHANGE [HASH(shipping_addr.state)] | | | | | 03:AGGREGATE | | | output: count(order_id) | | | group by: state | | | | | 02:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] | | | hash predicates: address_id = address_id | | | | | |--05:EXCHANGE [HASH(address_id)] | | | | | | | 01:SCAN HDFS [debug.addresses] | | | partitions=1/1 files=1 size=18B | | | predicates: lower(debug.addresses.state) = lower('NY') <-- should also be assigned at node 02 | | | | | 04:EXCHANGE [HASH(address_id)] | | | | | 00:SCAN HDFS [debug.orders] | | partitions=1/1 files=1 size=15B | +------------------------------------------------------------------------------------+
Workaround
Use an INNER JOIN instead of a LEFT OUTER JOIN. The query still has the same meaning.