Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18058

AnalysisException may be thrown when union two DFs whose struct fields have different nullability

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.2, 2.0.1
    • Fix Version/s: 2.0.2, 2.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      The following Spark shell snippet reproduces this issue:

      spark.range(10).createOrReplaceTempView("t1")
      spark.range(10).map(i => i: java.lang.Long).toDF("id").createOrReplaceTempView("t2")
      sql("SELECT struct(id) FROM t1 UNION ALL SELECT struct(id) FROM t2")
      
      org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. StructType(StructField(id,LongType,true)) <> StructType(StructField(id,LongType,false)) at the first column of the second table;
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:57)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$11$$anonfun$apply$12.apply(CheckAnalysis.scala:291)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$11$$anonfun$apply$12.apply(CheckAnalysis.scala:289)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$11.apply(CheckAnalysis.scala:289)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$11.apply(CheckAnalysis.scala:278)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:278)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67)
        at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:132)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:57)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:61)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:573)
        ... 50 elided
      

      The reason is that we treat two StructType incompatible even if their only differ from each other in field nullability.

        Attachments

          Activity

            People

            • Assignee:
              CodingCat Nan Zhu
              Reporter:
              lian cheng Cheng Lian
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: