[SPARK-40963] ExtractGenerator sets incorrect nullability in new Project - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.3, 3.2.2, 3.3.1, 3.4.0
Fix Version/s: 3.2.3, 3.3.2, 3.4.0
Component/s: SQL
Labels:
- correctness

Description

Example:

select c1, explode(c4) as c5 from (
  select c1, array(c3) as c4 from (
    select c1, explode_outer(c2) as c3
    from values
    (1, array(1, 2)),
    (2, array(2, 3)),
    (3, null)
    as data(c1, c2)
  )
);

+---+---+
|c1 |c5 |
+---+---+
|1  |1  |
|1  |2  |
|2  |2  |
|2  |3  |
|3  |0  |
+---+---+

In the last row, c5 is 0, but should be NULL.

Another example:

select c1, exists(c4, x -> x is null) as c5 from (
  select c1, array(c3) as c4 from (
    select c1, explode_outer(c2) as c3
    from values
    (1, array(1, 2)),
    (2, array(2, 3)),
    (3, null)
    as data(c1, c2)
  )
);

+---+-----+
|c1 |c5   |
+---+-----+
|1  |false|
|1  |false|
|2  |false|
|2  |false|
|3  |false|
+---+-----+

In the last row, false should be true.

In both cases, at the time CreateArray(c3) is instantiated, c3's nullability is incorrect because the new projection created by ExtractGenerator uses generatorOutput from explode_outer(c2) as a projection list. generatorOutput doesn't take into account that explode_outer(c2) is an outer explode, so the nullability setting is lost.

UpdateAttributeNullability will eventually fix the nullable setting for attributes referring to c3, but it doesn't fix the containsNull setting for c4 in explode(c4) (from the first example) or exists(c4, x -> x is null) (from the second example).

This example fails with a NullPointerException:

select c1, inline_outer(c4) from (
  select c1, array(c3) as c4 from (
    select c1, explode_outer(c2) as c3
    from values
    (1, array(named_struct('a', 1, 'b', 2))),
    (2, array(named_struct('a', 3, 'b', 4), named_struct('a', 5, 'b', 6))),
    (3, null)
    as data(c1, c2)
  )
);

22/10/27 11:53:20 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 2)
java.lang.NullPointerException
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_1$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)

Attachments

Issue Links

links to

[Github] Pull Request #38440 (bersprockets)

Activity

People

Assignee:: Bruce Robbins

Reporter:: Bruce Robbins

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 28/Oct/22 23:13

Updated:: 12/Dec/22 18:11

Resolved:: 31/Oct/22 01:50