Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25051

where clause on dataset gives AnalysisException

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.2
    • Component/s: SQL
    • Labels:

      Description

      schemas :
      df1
      => id ts
      df2
      => id name country

      code:

      val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull)

      error:

      org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing from xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in operator !Filter isnull(id#0). Attribute(s) with the same name appear in the operation: id. Please check if the right attribute(s) are used.;;

       at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
          at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
          at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289)
          at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
          at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
          at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
          at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
          at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
          at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
          at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
          at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
          at org.apache.spark.sql.Dataset.<init>(Dataset.scala:172)
          at org.apache.spark.sql.Dataset.<init>(Dataset.scala:178)
          at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
          at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300)
          at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458)
          at org.apache.spark.sql.Dataset.where(Dataset.scala:1486)

      This works fine in spark 2.2.2

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mgaido Marco Gaido
                Reporter:
                MIK1007 MIK
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: