Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10976

Sync db/table in catalogd to latest HMS event id for all DDLs from Impala shell

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Catalog, Frontend

    Description

      This is a follow up from IMPALA-10926. The idea is that when any DDL operation is performed from Impala shell, it also syncs the db/table to its latest event ID as per HMS. This way updates to a db/table's are applied in the same order as they appear in the Notification log in HMS which ensures consistency. Currently catalogD applies any updates received from Impala shell in place. Instead it should perform an HMS operation first and then replay all the HMS events since the last synced event.

       However there are subtle differences in how Impala processes DDLs via shell vs how it processes HMS events These are:

      • When processing an alter table event, currently catalogD does a full table reload. This has a performance impact as table reload is time consuming. Whereas in place alter table DDL operation in catalogOpExecutor (via Impala shell) is faster since detects when to reload table schema or file metadata or both. Need some improvements in Alter table event processing logic to detect whether to reload the file metadata or not. --> This is addressed by IMPALA-11534
      • Similar improvement is required in processing alter partition event. As of now, when processing AlterPartition HMS event, catalogd always  reloads file metadata but when doing the same from shell, it reloads metadata only when it is required. 
      • Impala shell already caches hive fns in catalog db’s object.  But catalogD does not process CREATE/DROP Fns HMS event
      • When creating a db/table from Impala shell, if the operation fails because the db/table already exists, then there is no reliable way in catalogd to determine create event id for that db/table. The create event is required so that for any subsequent ddl operations, catalogd can process HMS events starting from createEvent Id. 

      Attachments

        Activity

          People

            hemanth619 Sai Hemanth Gantasala
            sourabh912 Sourabh Goyal
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: