Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.1.3, 3.2.2, 3.3.1, 3.4.0
Description
Example:
select c1, explode(c4) as c5 from ( select c1, array(c3) as c4 from ( select c1, explode_outer(c2) as c3 from values (1, array(1, 2)), (2, array(2, 3)), (3, null) as data(c1, c2) ) ); +---+---+ |c1 |c5 | +---+---+ |1 |1 | |1 |2 | |2 |2 | |2 |3 | |3 |0 | +---+---+
In the last row, c5 is 0, but should be NULL.
Another example:
select c1, exists(c4, x -> x is null) as c5 from ( select c1, array(c3) as c4 from ( select c1, explode_outer(c2) as c3 from values (1, array(1, 2)), (2, array(2, 3)), (3, null) as data(c1, c2) ) ); +---+-----+ |c1 |c5 | +---+-----+ |1 |false| |1 |false| |2 |false| |2 |false| |3 |false| +---+-----+
In the last row, false should be true.
In both cases, at the time CreateArray(c3) is instantiated, c3's nullability is incorrect because the new projection created by ExtractGenerator uses generatorOutput from explode_outer(c2) as a projection list. generatorOutput doesn't take into account that explode_outer(c2) is an outer explode, so the nullability setting is lost.
UpdateAttributeNullability will eventually fix the nullable setting for attributes referring to c3, but it doesn't fix the containsNull setting for c4 in explode(c4) (from the first example) or exists(c4, x -> x is null) (from the second example).
This example fails with a NullPointerException:
select c1, inline_outer(c4) from ( select c1, array(c3) as c4 from ( select c1, explode_outer(c2) as c3 from values (1, array(named_struct('a', 1, 'b', 2))), (2, array(named_struct('a', 3, 'b', 4), named_struct('a', 5, 'b', 6))), (3, null) as data(c1, c2) ) ); 22/10/27 11:53:20 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 2) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_1$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)