[SPARK-34133] [AVRO] Respect case sensitivity when performing Catalyst-to-Avro field matching - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0, 3.2.0
Fix Version/s: 3.1.1
Component/s: Input/Output, SQL
Labels:
None

Description

Spark SQL is normally case-insensitive (by default), but currently when AvroSerializer and AvroDeserializer perform matching between Catalyst schemas and Avro schemas, the matching is done in a case-sensitive manner. So for example the following will fail:

      val avroSchema =
        """
          |{
          |  "type" : "record",
          |  "name" : "test_schema",
          |  "fields" : [
          |    {"name": "foo", "type": "int"},
          |    {"name": "BAR", "type": "int"}
          |  ]
          |}
      """.stripMargin
      val df = Seq((1, 3), (2, 4)).toDF("FOO", "bar")

      df.write.option("avroSchema", avroSchema).format("avro").save(savePath)

The same is true on the read path, if we assume testAvro has been written using the schema above, the below will fail to match the fields:

df.read.schema(new StructType().add("FOO", IntegerType).add("bar", IntegerType))
  .format("avro").load(testAvro)

Attachments

Issue Links

relates to

SPARK-34182 [AVRO] Improve error messages when matching Catalyst-to-Avro schemas

Resolved

links to

[Github] Pull Request #31201 (xkrogen)

Activity

People

Assignee:: Erik Krogen

Reporter:: Erik Krogen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/Jan/21 22:19

Updated:: 25/Jan/21 04:55

Resolved:: 25/Jan/21 04:55