Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34133

[AVRO] Respect case sensitivity when performing Catalyst-to-Avro field matching

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0, 3.2.0
    • Fix Version/s: 3.1.1
    • Component/s: Input/Output, SQL
    • Labels:
      None

      Description

      Spark SQL is normally case-insensitive (by default), but currently when AvroSerializer and AvroDeserializer perform matching between Catalyst schemas and Avro schemas, the matching is done in a case-sensitive manner. So for example the following will fail:

            val avroSchema =
              """
                |{
                |  "type" : "record",
                |  "name" : "test_schema",
                |  "fields" : [
                |    {"name": "foo", "type": "int"},
                |    {"name": "BAR", "type": "int"}
                |  ]
                |}
            """.stripMargin
            val df = Seq((1, 3), (2, 4)).toDF("FOO", "bar")
      
            df.write.option("avroSchema", avroSchema).format("avro").save(savePath)
      

      The same is true on the read path, if we assume testAvro has been written using the schema above, the below will fail to match the fields:

      df.read.schema(new StructType().add("FOO", IntegerType).add("bar", IntegerType))
        .format("avro").load(testAvro)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                xkrogen Erik Krogen
                Reporter:
                xkrogen Erik Krogen
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: