[IMPALA-13484] Querying an Iceberg table with TIMESTAMP_LTZ can cause data loss - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Impala 4.5.0
Component/s: Frontend
Labels:
- impala-iceberg

Epic Color:
ghx-label-7

Description

Repro steps:

1. Create a table with Hive that has a TS_LTZ column:

create table ice_hive_tbl (i int, ts_ltz timestamp with local time zone) stored by iceberg;

2. Insert some data using Hive:

insert into ice_hive_tbl values (1, current_timestamp());

3. Add a breakpoint in Impala to the table loading code right before Impala sends out an alter_table to HMS to change the column type from TS_LTZ to TS. Here, for instance.
4. Query the table from Impala. This triggers a table load. Impala will come to a decision that it should change the TS_LTZ type of a column to TS. However, the break point will hold it doing this at this point.
5. Use Hive to add additional rows into the table:

insert into ice_hive_tbl values (2, current_timestamp());
insert into ice_hive_tbl values (3, current_timestamp());

6. Release the breakpoint and let Impala finish the SELECT query started at 4)
7. Do another SELECT * from Hive and/or Impala and verify that the extra rows added at 5) are not present in the table.

Root cause:

When Impala changes the TS_LTZ column to TS it does so by calling alter_table() on HMS directly. It gives a Metastore Table object to HMS as the desired state of the table. HMS then persists this table object.

The problem with this:

Impala doesn't use the Iceberg API to alter the table. As a result there is no conflict detection performed, and it won't be detected that another commits went into the table since Impala grabbed a table object from HMS.
The metadata.json path is part of the table properties, and when Impala calls alter_table(tbl_obj) HMS will also persist this metadata path to the table, even though there were other changes that already moved the metadata path forward.
Essentially this will revert the changes on the table back to the state when Impala loaded the table object from HMS.
In a high-frequency scenario this could cause problems when Hive (or even Spark) heavily writes the table and meanwhile Impala reads this table. Some snapshots could be unintentionally reverted by this behavior and as a result could cause data loss or any changes like deletes being reverted.

Just a note, FWIW, with the current approach Impala doesn't change the column types in the Iceberg metadata, it does change the column types in HMS. So even with this, the Iceberg metadata would show the column type as timestamptz.
Note2, I described this problem using timestamp with local time zone as an example but it could also be triggered by other column types not entirely compatible with Impala. I haven't made my research to find out if there is any other such type, though.
Note3, this issue seems to be there forever. I found the code that triggers this being added by one of the first changes wrt Iceberg integration, the "Create Iceberg table" change.

Possible solutions:
1. Impala can do the alter table by calling the Iceberg API and not HMS directly.
There are thing to be careful about:

With this approach would the above repro steps make the table loading fail due to conflict between commits on the tables? Or could the schema change be merged automatically be Iceberg lib to the latest state even if there had been changes on the table? I think this would work as expected and won't reject loading the table, but we should make sure when testing this.
With this approach Impala would set the TS_LTZ cols to TS properly causing no snapshots to be lost. However, when a new write is performed by Hive/Spark, they'd set the col types back to TS_LTZ. And then when Impala reads the table again, it will set these cols to TS again. And so on. Question is, would a scenario like this flood Iceberg metadata, e.g. metadata.json with all this uncontrolled schema changes?
Now we talk about schema changes, but in fact what the code does now is way wider than that. It sends a table object to HMS to persist it. We have to double check if the current approach only persists schema changes or could do any other changes too. E.g. the code also sets DO_NOT_UPDATE_STATS property and the last DDL time too. Could it change anything else as well that we might miss with this approach?

2. Do not do any alter_tables after loading the Iceberg table
This approach would simply drop the code after this line and won't do any HMS schema changes. Impala then internally could use the adjusted column types, but won't change the types of the columns in HMS. The question here is if this would break any use cases.

Attachments

Issue Links

is related to

IMPALA-13499 REFRESH on Iceberg tables can lead to data loss

Open

Querying an Iceberg table with TIMESTAMP_LTZ can cause data loss

Details

Description

Attachments

Issue Links

Activity

People

Dates