Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18058

AnalysisException may be thrown when union two DFs whose struct fields have different nullability

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.2, 2.0.1
    • 2.0.2, 2.1.0
    • SQL
    • None

    Description

      The following Spark shell snippet reproduces this issue:

      spark.range(10).createOrReplaceTempView("t1")
      spark.range(10).map(i => i: java.lang.Long).toDF("id").createOrReplaceTempView("t2")
      sql("SELECT struct(id) FROM t1 UNION ALL SELECT struct(id) FROM t2")
      
      org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. StructType(StructField(id,LongType,true)) <> StructType(StructField(id,LongType,false)) at the first column of the second table;
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:57)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$11$$anonfun$apply$12.apply(CheckAnalysis.scala:291)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$11$$anonfun$apply$12.apply(CheckAnalysis.scala:289)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$11.apply(CheckAnalysis.scala:289)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$11.apply(CheckAnalysis.scala:278)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:278)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67)
        at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:132)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:57)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:61)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:573)
        ... 50 elided
      

      The reason is that we treat two StructType incompatible even if their only differ from each other in field nullability.

      Attachments

        Activity

          People

            codingcat Nan Zhu
            lian cheng Cheng Lian
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: