Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45171

GenerateExec fails to initialize non-deterministic expressions before use

    XMLWordPrintableJSON

Details

    Description

      The following query fails:

      select *
      from explode(
        transform(sequence(0, cast(rand()*1000 as int) + 1), x -> x * 22)
      );
      

      The error is:

      23/09/14 09:27:25 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
      java.lang.IllegalArgumentException: requirement failed: Nondeterministic expression org.apache.spark.sql.catalyst.expressions.Rand should be initialized before eval.
      	at scala.Predef$.require(Predef.scala:281)
      	at org.apache.spark.sql.catalyst.expressions.Nondeterministic.eval(Expression.scala:497)
      	at org.apache.spark.sql.catalyst.expressions.Nondeterministic.eval$(Expression.scala:495)
      	at org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:35)
      	at org.apache.spark.sql.catalyst.expressions.BinaryArithmetic.eval(arithmetic.scala:384)
      	at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:543)
      	at org.apache.spark.sql.catalyst.expressions.BinaryArithmetic.eval(arithmetic.scala:384)
      	at org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:3062)
      	at org.apache.spark.sql.catalyst.expressions.SimpleHigherOrderFunction.eval(higherOrderFunctions.scala:275)
      	at org.apache.spark.sql.catalyst.expressions.SimpleHigherOrderFunction.eval$(higherOrderFunctions.scala:274)
      	at org.apache.spark.sql.catalyst.expressions.ArrayTransform.eval(higherOrderFunctions.scala:308)
      	at org.apache.spark.sql.catalyst.expressions.ExplodeBase.eval(generators.scala:375)
      	at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108)
      ...        
      

      However, this query succeeds:

      select *
      from explode(
        sequence(0, cast(rand()*1000 as int) + 1)
      );
      

      The difference is that transform turns off whole-stage codegen, which exposes a bug in GenerateExec where the non-deterministic expression passed to the generator function is not initialized before being used.

      An even simpler reprod case is:

      set spark.sql.codegen.wholeStage=false;
      
      select explode(array(rand()));
      

      Attachments

        Issue Links

          Activity

            People

              bersprockets Bruce Robbins
              bersprockets Bruce Robbins
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: