Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.10.0
-
ghx-label-9
Description
SELECT COALESCE() on multiple arguments spanning different tables with an OUTER JOIN on Kudu tables is not properly being applied. This behavior is
- different relative to Kudu tables in 2.9
- different relative to the 2.10 behavior with HDFS, seemingly making this Kudu-specific
- different from Postgres, which matches the HDFS behavior, further making this seem Kudu-specific
Consider this query:
USE tpch_kudu; SELECT COALESCE(a2.n_nationkey, a1.p_size), a2.n_nationkey, a1.p_size FROM part a1 LEFT JOIN nation a2 ON (a1.p_size) = (a2.n_nationkey);
Some of the rows returned include:
+-------------------------------------+-------------+--------+ | coalesce(a2.n_nationkey, a1.p_size) | n_nationkey | p_size | +-------------------------------------+-------------+--------+ [snip] | 21 | 21 | 21 | | 22 | 22 | 22 | | 23 | 23 | 23 | | 24 | 24 | 24 | | NULL | NULL | 25 | | NULL | NULL | 26 | | NULL | NULL | 27 | [snip]
The COALESCE() column is not returning the value of p_size when its first argument, n_nationkey is NULL. tpch_kudu.nation n_nationkey has values between 0 and 24, hence the NULL values in that column when part.p_size is greater.
This goes away if you keep the query above but switch the ordering of the COALESCE() arguments.
I can see the same sort of problems if I write similar RIGHT or FULL OUTER JOIN queries:
USE tpch_kudu; SELECT DISTINCT COALESCE(a2.n_nationkey, a1.p_size), a2.n_nationkey, a1.p_size FROM part a1 FULL OUTER JOIN nation a2 ON (a1.p_size) = (a2.n_nationkey) ORDER BY 1,2,3;
USE tpch_kudu; SELECT DISTINCT COALESCE(a2.n_nationkey, a1.p_size), a2.n_nationkey, a1.p_size FROM nation a2 RIGHT JOIN part a1 ON (a1.p_size) = (a2.n_nationkey) ORDER BY 1,2,3;
Explain-level 2 plans and profiles will be attached.
Attachments
Attachments
Issue Links
- is broken by
-
IMPALA-5016 Missed opportunities for static partition pruning with COALESCE()
- Resolved
-
IMPALA-1861 Conditional functions with constant arguments should be simplified during analysis
- Resolved