Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
-
ghx-label-9
Description
This is a follow up from IMPALA-10926. The idea is that when any DDL operation is performed from Impala shell, it also syncs the db/table to its latest event ID as per HMS. This way updates to a db/table's are applied in the same order as they appear in the Notification log in HMS which ensures consistency. Currently catalogD applies any updates received from Impala shell in place. Instead it should perform an HMS operation first and then replay all the HMS events since the last synced event.
However there are subtle differences in how Impala processes DDLs via shell vs how it processes HMS events These are:
- When processing an alter table event, currently catalogD does a full table reload. This has a performance impact as table reload is time consuming. Whereas in place alter table DDL operation in catalogOpExecutor (via Impala shell) is faster since detects when to reload table schema or file metadata or both. Need some improvements in Alter table event processing logic to detect whether to reload the file metadata or not. --> This is addressed by
IMPALA-11534 - Similar improvement is required in processing alter partition event. As of now, when processing AlterPartition HMS event, catalogd always reloads file metadata but when doing the same from shell, it reloads metadata only when it is required.
- Impala shell already caches hive fns in catalog db’s object. But catalogD does not process CREATE/DROP Fns HMS event
- When creating a db/table from Impala shell, if the operation fails because the db/table already exists, then there is no reliable way in catalogd to determine create event id for that db/table. The create event is required so that for any subsequent ddl operations, catalogd can process HMS events starting from createEvent Id.