Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23477

Misleading exception message when union fails due to metadata

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Cannot Reproduce
    • 2.2.1
    • None
    • SQL
    • None

    Description

      When I have two DF's that are different only in terms of metadata in fields inside a struct - I cannot union them but the error message shows that they are the same:

      df = spark.createDataFrame([{'a':1}])
      a = df.select(struct('a').alias('x'))
      b = df.select(col('a').alias('a',metadata={'description':'xxx'})).select(struct(col('a')).alias('x'))
      a.union(b).printSchema()

      gives:

      An error occurred while calling o1076.union.
      : org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. struct<a:bigint> <> struct<a:bigint> at the first column of the second table

      and this part:

      struct<a:bigint> <> struct<a:bigint>

      does not make any sense because those are the same.

       

      Since metadata must be the same for union -> it should be incuded in the error message

      Attachments

        Activity

          People

            Unassigned Unassigned
            kretes Tomasz Bartczak
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: