Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20273

Disallow Non-deterministic Filter push-down into Join Conditions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.2, 2.1.0
    • 2.2.0
    • SQL
    • None

    Description

      sql("SELECT t1.b, rand(0) as r FROM cachedData, cachedData t1 GROUP BY t1.b having r > 0.5").show()
      

      We will get the following error:

      Job aborted due to stage failure: Task 1 in stage 4.0 failed 1 times, most recent failure: Lost task 1.0 in stage 4.0 (TID 8, localhost, executor driver): java.lang.NullPointerException
      	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source)
      	at org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
      	at org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec$$anonfun$org$apache$spark$sql$execution$joins$BroadcastNestedLoopJoinExec$$boundCondition$1.apply(BroadcastNestedLoopJoinExec.scala:87)
      	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463)
      

      Filters could be pushed down to the join conditions by the optimizer rule PushPredicateThroughJoin. However, we block users to add non-deterministics conditions by the analyzer (For details, see the PR https://github.com/apache/spark/pull/7535).

      We should not push down non-deterministic conditions; otherwise, we should allow users to do it by explicitly initialize the non-deterministic expressions

      Attachments

        Activity

          People

            smilegator Xiao Li
            smilegator Xiao Li
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: