Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34133

[AVRO] Respect case sensitivity when performing Catalyst-to-Avro field matching

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0, 3.2.0
    • 3.1.1
    • Input/Output, SQL
    • None

    Description

      Spark SQL is normally case-insensitive (by default), but currently when AvroSerializer and AvroDeserializer perform matching between Catalyst schemas and Avro schemas, the matching is done in a case-sensitive manner. So for example the following will fail:

            val avroSchema =
              """
                |{
                |  "type" : "record",
                |  "name" : "test_schema",
                |  "fields" : [
                |    {"name": "foo", "type": "int"},
                |    {"name": "BAR", "type": "int"}
                |  ]
                |}
            """.stripMargin
            val df = Seq((1, 3), (2, 4)).toDF("FOO", "bar")
      
            df.write.option("avroSchema", avroSchema).format("avro").save(savePath)
      

      The same is true on the read path, if we assume testAvro has been written using the schema above, the below will fail to match the fields:

      df.read.schema(new StructType().add("FOO", IntegerType).add("bar", IntegerType))
        .format("avro").load(testAvro)
      

      Attachments

        Issue Links

          Activity

            People

              xkrogen Erik Krogen
              xkrogen Erik Krogen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: