[IMPALA-12667] Mention interoperability considerations for Iceberg table conversion - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Documentation
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
- impala-iceberg

Target Version:

Impala 4.4.0
Epic Color:
ghx-label-9

Description

When Impala writes legacy tables with STRING columns, it doesn't add UTF8 annotation in the Parquet files. It doesn't do it because the users might store binary data in STRING columns (Impala only supports BINARY columns recently).

When a legacy table is converted to Iceberg, the data files are not re-written, i.e. we just create the Iceberg metadata files over the existing data files.

The Iceberg spec requires STRING columns to be stored with UTF8 annotation in Parquet files. Non-impala readers might throw exceptions when they find STRING columns without UTF8 annotation.

Add a section about the above in the docs. Also mention CTAS statements as a possible workaround.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Zoltán Borók-Nagy

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Jan/24 10:21

Updated:: 02/Jan/24 10:21