Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-48871

Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in CheckAnalysis

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0, 3.5.2, 3.4.4
    • 4.0.0, 3.5.2
    • SQL

    Description

      I encountered the following exception when attempting to use a non-deterministic udf in my query.

      [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic expression, but the actual expression is "[some expression]".; line 2 pos 1
      [info] [some logical plan]
      [info] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52)
      [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761)
      [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182)
      [info] at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
      [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182)
      [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164)
      [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188)
      [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160)
      [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150)
      [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188)
      [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211)
      [info] at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
      [info] at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208)
      [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77)
      [info] at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
      [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
      [info] at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
      [info] at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
      [info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
      [info] at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
      [info] at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77)
      [info] at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
      [info] at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)

      The non-deterministic expression can be safely allowed for my custom LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis rule is too strict so that reasonable use cases of non-deterministic expressions are also disabled.

      To fix this, we could add a trait that logical plans can extend to implement a method to decide whether there can be non-deterministic expressions for the operator, and check this function in checkAnalysis. This allows delegation of this validation to frameworks that extend Spark so we can allow list more than just the few explicitly named logical plans (e.g. `Project`, `Filter`). 

      Attachments

        Issue Links

          Activity

            People

              c27kwan Carmen Kwan
              c27kwan Carmen Kwan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: