Concurrent alter_table calls can interfere and can cause the metadata_location property of an Iceberg table to be messed up.
Basically there's no table level locking for Iceberg tables in Hive during the usual operations, and thus some extra performance related features are available, like concurrent inserts, as opposed to native Hive tables. This was done under the assumption that the optimistic locking pattern that is used in HiveTableOperations protects changing the metadata_location by the use of an HMS table lock there only.
This is fine until some other alter_table calls get into the system such as one from StatTask or DDLTask. Such tasks perform their work as:
- get the current table
- do the alteration
- send the changes via alter_table call to HMS
In between the retrieval of the table and the alter_table call a legit commit from HiveTableOperations might bump the metadata_location, but this will get reverted as these tasks consider an outdated metadata_location (and the alter table call will overwrite all table props including this one too..)
This is a design issue, and to solve this while preserving the concurrency features I propose to make use of HiveIcebergMetaHook where all such alter_table calls are intercepted, and the same locking mechanism could be used there as the one found in HiveTableOperations. The proposed flow on HMS client side would be:
- hook: preAlterTable
- request table level lock
- refresh the Iceberg table from catalog (HMS) to see if new updates have arrived
- compare the current metadata with the one thought to be the base of this request, if metadata_location is outdated overwrite it with the fresh, current one in this request
- do the alter_table call to HMS with the relevant changes (updated stats or other properties)
- hook: post/rollbackAlterTable
- release table level lock
This can work as the metadata_location should never be changed by anything other than HiveTableOperations, which is the only thing not using this hook (if it did we'd be in an endless loop). There's actually one exception which is if a user wants to change the metadata_location by hand. I can make an exception to that signalling this fact from an environmentContext instance when the corresponding AlterTableSetPropertiesDesc is constructed.
- links to