Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39775

Regression due to AVRO-2035

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.1
    • 3.3.1, 3.2.3, 3.4.0
    • Spark Core
    • None

    Description

      With the upgrade in Avro version to 1.9.0, for schema evolution Avro added https://issues.apache.org/jira/browse/AVRO-2035(enable validation of default values in schemas by default) which is causing regressions when user upgrades their Spark verion.

      Repro code:

      import org.apache.spark.sql.avro.functions._
      val avroTypeStruct = s"""
                              |{
                              |  "type": "record",
                              |  "name": "struct",
                              |  "fields": [
                              |    {"name": "id", "type": "long", "default": null}
                              |  ]
                              |}""".stripMargin
      
      val df = spark.range(10).select(struct('id).as("struct"))
      val avroStructDF = df.select(to_avro('struct, avroTypeStruct).as("avro"))
      avroStructDF.select(from_avro('avro, avroTypeStruct)).show()
      

      Hive mitigated it by disabling this feature altogether in https://issues.apache.org/jira/browse/HIVE-24797
      Spark-Hive integration also imported the above changes in https://issues.apache.org/jira/browse/SPARK-34512

      Can we have a fix for all the senarios?

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yumwang Yuming Wang
            nandininelson Nandini
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment