Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 3.1.0
-
None
Description
Currently, in Impala there are multiple ways to invalidate or refresh the metadata stored in Catalog for Tables. Objects in Catalog can be invalidated either on usage based approach (invalidate_tables_timeout_s) or when there is GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. However, most users issue invalidate commands when they want to sync to the latest information from HDFS or HMS. Unfortunately, when data is modified or new data is added outside Impala (eg. Hive) or a different Impala cluster, users don't have a clear idea on whether they have to issue invalidate or not. To be on the safer side, users keep issuing invalidate commands more than necessary and it causes performance as well as stability issues.
Hive Metastore provides a simple API to get incremental updates to the metadata information stored in its database. Each API which does a add/alter/drop operation in metastore generates event(s) which can be fetched using get_next_notification API. Each event has a unique and increasing event_id. The current notification event id can be fetched using get_current_notificationEventId API.
This JIRA proposes to make use of such events from metastore to proactively either invalidate or refresh information in the catalogD. When configured, CatalogD could poll for such events and take action (like add/drop/refresh partition, add/drop/invalidate tables and databases) based on the events. This way we can automatically refresh the catalogD state using events and it would greatly help the use-cases where users want to see the latest information (within a configurable interval of time delay) without flooding the system with invalidate requests.
I will be attaching a design doc to this JIRA and create subtasks for the work. Feel free to make comments on the JIRA or make suggestions to improve the design.
Attachments
Attachments
Issue Links
- causes
-
IMPALA-10468 DROP events which are generated while a batch is being processed may add table incorrectly
- Resolved
- incorporates
-
IMPALA-8592 Add support for insert events for 'LOAD DATA..' statements from Impala.
- Resolved
-
IMPALA-10273 Support function events
- Open
-
IMPALA-9857 Batch ALTER_PARTITION events
- Resolved
- is duplicated by
-
IMPALA-3124 Add external metadata notification mechanisms to Impala
- Resolved
- is related to
-
IMPALA-11533 EventProcessor Completeness
- Open
-
IMPALA-9101 Unneccessary REFRESH due to wrong self-event detection
- Resolved
-
IMPALA-5151 Adding partition on impala over a table with old metadata
- Resolved
-
IMPALA-8600 Reload partition does not work for transactional tables
- Resolved
-
IMPALA-10923 Fine grained table refreshing at partition level events for transactional tables
- Resolved
- relates to
-
IMPALA-4272 Automatic invalidation HDFS file and block metadata
- Open
- links to