Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12463

Allow batching of non consecutive metastore events

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.4.0
    • Catalog
    • None

    Description

      Currently Impala tries to batch events like partition insert/creation only if:
      1. the next event is for the same table as the previous one
      2. the next event's id is the previous one's + 1
      3. the next event has the same type as the previous one
      (2 can be stricter than 1 if some events were filtered between the two)

      See https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java#L315

      Another limit is that only events in the same batch from HMS can be merged. Currently 1000 events are polled at the same time: https://github.com/apache/impala/blob/94f4f1d82461d8f71fbd0d2e9082aa29b5f53a89/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java#L218
      Making this configurable could be also useful.

      Event batching could be improved by batching all events to the current one if they modify the same table, unless they are "cut" by:
      a. an event on the same table but with a different type
      b. a rename table event where the original or the new name is the same as the current event
      If such an event occurs, the events after that can be only merged to a newer event.

      Attachments

        1. concurrent_metadata_load.py
          3 kB
          Joe McDonnell

        Issue Links

          Activity

            People

              joemcdonnell Joe McDonnell
              csringhofer Csaba Ringhofer
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: