Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-492 Hive Hook Improvements
  3. ATLAS-568

Parallelize Hive hook operations

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7-incubating
    • None
    • None
    • None

    Description

      Maintaining the same order of operations that were executed in hive is crucial also on top of ATLAS . This is because if they are not ordered, it could easily lead to correctness issues in the ATLAS repository. For eg: Table columns being dropped and then table is renamed , dropping tables, databases etc all need to be executed in the same order as they were in hive metastore. There are multiple issues that needs to be addressed here

      1. How do we ensure order of messages on the producer/hook side?
      2. Once producer/hook publishes these messages onto KAFKA, how do we ensure the order of processing is the same as it was published.

      One suggested approach is to assign a timestamp to all the messages on the producer side and have a window/batch these messages on the consumer/ATLAS server side.

      Attachments

        Activity

          People

            Unassigned Unassigned
            suma.shivaprasad Suma Shivaprasad
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: