[SPARK-45171] GenerateExec fails to initialize non-deterministic expressions before use - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.0
Fix Version/s: 4.0.0, 3.5.1
Component/s: SQL
Labels:
- pull-request-available

Description

The following query fails:

select *
from explode(
  transform(sequence(0, cast(rand()*1000 as int) + 1), x -> x * 22)
);

The error is:

23/09/14 09:27:25 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.lang.IllegalArgumentException: requirement failed: Nondeterministic expression org.apache.spark.sql.catalyst.expressions.Rand should be initialized before eval.
	at scala.Predef$.require(Predef.scala:281)
	at org.apache.spark.sql.catalyst.expressions.Nondeterministic.eval(Expression.scala:497)
	at org.apache.spark.sql.catalyst.expressions.Nondeterministic.eval$(Expression.scala:495)
	at org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:35)
	at org.apache.spark.sql.catalyst.expressions.BinaryArithmetic.eval(arithmetic.scala:384)
	at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:543)
	at org.apache.spark.sql.catalyst.expressions.BinaryArithmetic.eval(arithmetic.scala:384)
	at org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:3062)
	at org.apache.spark.sql.catalyst.expressions.SimpleHigherOrderFunction.eval(higherOrderFunctions.scala:275)
	at org.apache.spark.sql.catalyst.expressions.SimpleHigherOrderFunction.eval$(higherOrderFunctions.scala:274)
	at org.apache.spark.sql.catalyst.expressions.ArrayTransform.eval(higherOrderFunctions.scala:308)
	at org.apache.spark.sql.catalyst.expressions.ExplodeBase.eval(generators.scala:375)
	at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108)
...

However, this query succeeds:

select *
from explode(
  sequence(0, cast(rand()*1000 as int) + 1)
);

The difference is that transform turns off whole-stage codegen, which exposes a bug in GenerateExec where the non-deterministic expression passed to the generator function is not initialized before being used.

An even simpler reprod case is:

set spark.sql.codegen.wholeStage=false;

select explode(array(rand()));

Attachments

Issue Links

links to

GitHub Pull Request #42933

Activity

People

Assignee:: Bruce Robbins

Reporter:: Bruce Robbins

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Sep/23 16:37

Updated:: 15/Sep/23 04:23

Resolved:: 15/Sep/23 04:23