[SPARK-34365] Support configurable Avro schema field matching for positional or by-name - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.1
Fix Version/s: 3.2.0
Component/s: SQL
Labels:
None

Target Version/s:

3.2.0

Description

When reading an Avro dataset (using the dataset's schema or by overriding it with 'avroSchema') or writing an Avro dataset with a provided schema by 'avroSchema', currently the matching of Catalyst-to-Avro fields is done by field name.

This behavior is somewhat recent; prior to ~~SPARK-27762~~ (fixed in 3.0.0), at least on the write path, we would match the schemas by positionally ("structural" comparison). While I agree that this is much more sensible for default behavior, I propose that we make this behavior configurable using an option for the Avro datasource. Even at the time that ~~SPARK-27762~~ was handled, there was interest in making this behavior configurable, but it appears it went unaddressed.

There is precedence for configurability of this behavior as seen in ~~SPARK-32864~~, which added this support for ORC. Besides this precedence, the behavior of Hive is to perform matching positionally (ref), so this is behavior that Hadoop/Hive ecosystem users are familiar with:

Hive is very forgiving about types: it will attempt to store whatever value matches the provided column in the equivalent column position in the new table. No matching is done on column names, for instance.

Attachments

Issue Links

is related to

SPARK-27762 Support user provided avro schema for writing fields with different ordering

Resolved

SPARK-35918 Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch handling and error messages

Resolved

SPARK-32864 Support ORC forced positional evolution

Resolved

relates to

SPARK-34378 Loosen AvroSerializer validation to allow extra nullable user-provided fields

Resolved

links to

[Github] Pull Request #31490 (xkrogen)

Activity

People

Assignee:: Erik Krogen

Reporter:: Erik Krogen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 04/Feb/21 18:58

Updated:: 15/Sep/21 19:51

Resolved:: 30/Jun/21 08:21