Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7954

Support automatic invalidates using metastore notification events

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 3.1.0
    • Impala 4.1.0
    • Catalog
    • None

    Description

      Currently, in Impala there are multiple ways to invalidate or refresh the metadata stored in Catalog for Tables. Objects in Catalog can be invalidated either on usage based approach (invalidate_tables_timeout_s) or when there is GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. However, most users issue invalidate commands when they want to sync to the latest information from HDFS or HMS. Unfortunately, when data is modified or new data is added outside Impala (eg. Hive) or a different Impala cluster, users don't have a clear idea on whether they have to issue invalidate or not. To be on the safer side, users keep issuing invalidate commands more than necessary and it causes performance as well as stability issues.

      Hive Metastore provides a simple API to get incremental updates to the metadata information stored in its database. Each API which does a add/alter/drop operation in metastore generates event(s) which can be fetched using get_next_notification API. Each event has a unique and increasing event_id. The current notification event id can be fetched using get_current_notificationEventId API.

      This JIRA proposes to make use of such events from metastore to proactively either invalidate or refresh information in the catalogD. When configured, CatalogD could poll for such events and take action (like add/drop/refresh partition, add/drop/invalidate tables and databases) based on the events. This way we can automatically refresh the catalogD state using events and it would greatly help the use-cases where users want to see the latest information (within a configurable interval of time delay) without flooding the system with invalidate requests.

      I will be attaching a design doc to this JIRA and create subtasks for the work. Feel free to make comments on the JIRA or make suggestions to improve the design.

      Attachments

        1. Impala_Catalogd_Auto_Metadata_Update_v2.pdf
          171 kB
          Vihang Karajgaonkar
        2. Automatic_invalidate_DesignDoc_v1.pdf
          143 kB
          Vihang Karajgaonkar

        Issue Links

          1.
          Add support for automatic invalidates by polling metastore events Sub-task Resolved Vihang Karajgaonkar
          2.
          Add support to detect insert events from Impala Sub-task Resolved Anurag Mantripragada
          3.
          Detect self-events to avoid unnecessary invalidates Sub-task Resolved Vihang Karajgaonkar
          4.
          Add support for fine-grained updates at partition level Sub-task Resolved Anurag Mantripragada
          5.
          Impala Doc: Doc the options to enable automatic invalidates using metastore notification events Sub-task Closed Alexandra Rodoni
          6.
          Improve supportability of the automatic invalidate feature Sub-task Resolved Vihang Karajgaonkar
          7.
          Add a flag to disable sync using events at a table level Sub-task Resolved Vihang Karajgaonkar
          8.
          Impala Doc: Doc the support for fine-grained updates at partition level Sub-task Closed Alexandra Rodoni
          9.
          Add support for alter_database events Sub-task Resolved Xiaomeng Zhang
          10.
          Event processor should keep trying if metastore is unavailable Sub-task Resolved Anurag Mantripragada
          11.
          Event filtering logic may not filter all the events Sub-task Resolved Vihang Karajgaonkar
          12.
          Change metastore configuration template so that table parameters do not exclude impala specific properties Sub-task Resolved Vihang Karajgaonkar
          13.
          Fix MetastoreEventsProcessorTest flakiness Sub-task Resolved Vihang Karajgaonkar
          14.
          Impala Doc: Doc the flag to disable sync using events at the table level Sub-task Closed Alexandra Rodoni
          15.
          Impala Doc: Doc the supportability metrics of the automatic metadata invalidate feature Sub-task Closed Alexandra Rodoni
          16.
          Check CREATION_TIME of databases in event processor to avoid incorrect/redundant invalidates Sub-task Resolved Bharath Krishna
          17.
          Fetch metastore configuration values to detect misconfigured setups Sub-task Resolved Bharath Krishna
          18.
          Impala Doc: Document the feature to detect insert events from Impala Sub-task Closed Alexandra Rodoni
          19.
          Impala Doc: Doc the feature for alter_database events Sub-task Closed Alexandra Rodoni
          20.
          Support config validation for event processor on HMS-3 Sub-task Resolved Vihang Karajgaonkar
          21.
          Ignore trivial alter events Sub-task Resolved Anurag Mantripragada
          22.
          Add support for self-event detection for insert events Sub-task Resolved Xiaomeng Zhang
          23.
          Insert event should not error when table does not exists Sub-task Resolved Vihang Karajgaonkar
          24.
          Enable Event processing by default Impala e2e tests Sub-task Resolved Vihang Karajgaonkar
          25.
          Create randomized tests for stressing the event processor Sub-task Resolved Vihang Karajgaonkar
          26.
          Configuration validation introduced in IMPALA-8559 can be improved Sub-task Closed Anurag Mantripragada
          27.
          Enable event polling by default in tests Sub-task Resolved Vihang Karajgaonkar

          Activity

            People

              vihangk1 Vihang Karajgaonkar
              vihangk1 Vihang Karajgaonkar
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: