Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.0
Description
Example query:
spark.sql("select *, (select (count(1)) is null from t1 where t0.a = t1.c) from t0").collect() res6: Array[org.apache.spark.sql.Row] = Array([1,1.0,null], [2,2.0,false])
In this subquery, count(1) always evaluates to a non-null integer value, so count(1) is null is always false. The correct evaluation of the subquery is always false.
We incorrectly evaluate it to null for empty groups. The reason is that NullPropagation rewrites Aggregate [c] [isnull(count(1))] to Aggregate [c] [false] - this rewrite would be correct normally, but in the context of a scalar subquery it breaks our count bug handling in RewriteCorrelatedScalarSubquery.constructLeftJoins . By the time we get there, the query appears to not have the count bug - it looks the same as if the original query had a subquery with select any_value(false) from r..., and that case is not subject to the count bug.
Postgres comparison show correct always-false result: http://sqlfiddle.com/#!17/67822/5
DDL for the example:
create or replace temp view t0 (a, b) as values (1, 1.0), (2, 2.0); create or replace temp view t1 (c, d) as values (2, 3.0);