Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12933

Catalogd should set eventTypeSkipList when fetching specifit events for a table

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • Impala 4.4.0
    • Catalog
    • None

    Description

      There are several places that catalogd will fetch all events of a specifit type on a table. E.g. in TableLoader#load(), if the table has an old createEventId, catalogd will fetch all CREATE_TABLE events after that createEventId on the table.

      Fetching the list of events is expensive since the filtering is done on client side, i.e. catalogd fetch all events and filter them locally based on the event type and table name:
      https://github.com/apache/impala/blob/148888e3ed4f97292499b2e6ee8d5a756dc648d9/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L98-L102
      https://github.com/apache/impala/blob/b7ddbcad0dd6accb559a3f391a897a8c442d1728/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L336

      This could take hours if there are lots of events (e.g 1M) in HMS. In fact, NotificationEventRequest can specify an eventTypeSkipList. Catalogd can do the filtering of event type in HMS side. On higher Hive versions that have HIVE-27499, catalogd can also specify the table name in the request (IMPALA-12607).

      This Jira focus on specifying the eventTypeSkipList when fetching events of a particular type on a table.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            stigahuang Quanlong Huang
            stigahuang Quanlong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment