[IMPALA-13364] Schema resolution doesn't work for migrated partitioned Iceberg tables that have complex types - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Impala 4.5.0
Component/s: None
Labels:
- impala-iceberg

Epic Color:
ghx-label-4

Description

Schema resolution doesn't work correctly for migrated partitioned Iceberg tables that have complex types.

When we face a Parquet/ORC file in an Iceberg table that doesn't have field IDs in the file metadata, we assume that it is an old data file before migration, and the schema is the very first one, hence we can mimic Iceberg's field ID generation to assign field IDs to the file schema elements.

This process didn't take the partition columns into account. This only matters when there are complex types in the table, as partition columns are always the last columns in legacy Hive tables, and field IDs are assigned via a "BFS-like" traversal. I.e. if there are only primitive types in the table we don't have any problems, but the children of complex types columns are assigned incorrectly.

Attachments

Activity

People

Assignee:: Zoltán Borók-Nagy

Reporter:: Zoltán Borók-Nagy

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 05/Sep/24 14:00

Updated:: 04/Oct/24 06:30

Resolved:: 04/Oct/24 06:30