Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.4.0, 3.2.0
-
None
Description
Spark SQL is normally case-insensitive (by default), but currently when AvroSerializer and AvroDeserializer perform matching between Catalyst schemas and Avro schemas, the matching is done in a case-sensitive manner. So for example the following will fail:
val avroSchema = """ |{ | "type" : "record", | "name" : "test_schema", | "fields" : [ | {"name": "foo", "type": "int"}, | {"name": "BAR", "type": "int"} | ] |} """.stripMargin val df = Seq((1, 3), (2, 4)).toDF("FOO", "bar") df.write.option("avroSchema", avroSchema).format("avro").save(savePath)
The same is true on the read path, if we assume testAvro has been written using the schema above, the below will fail to match the fields:
df.read.schema(new StructType().add("FOO", IntegerType).add("bar", IntegerType)) .format("avro").load(testAvro)
Attachments
Issue Links
- relates to
-
SPARK-34182 [AVRO] Improve error messages when matching Catalyst-to-Avro schemas
-
- Resolved
-
- links to