Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.1.0
-
None
Description
Background
HiveServer2 hook for Atlas sends notification message for both metadata (DDL operations) and lineage (DML operations).
Hive Metastore (HMS) hook already sends metadata information to Atlas. These messages are all DDL operations.
So duplicate messages about object updates are sent to Atlas.
Atlas processes these messages like any other.
This is additional processing time and increased volume. There is also a potential of incorrect data being updated within Atlas if the sequence of messages from HMS and HS2 gets changed.
Solution
This improvement will send only lineage messages from HS2 hook. All the DDL (schema definition) messages will continue be sent from HMS hook (no change here).
This will also reduce the volume of messages sent to Atlas from hive server2 and will help improve performance by avoiding processing duplicate messages.
The improvement can be used via a configuration parameter. That way existing behavior continues as is.
Attachments
Issue Links
- fixes
-
ATLAS-4152 [Atlas: Spooling] Multiple entries are created for same table when the table is dropped while kafka is down
- Resolved
- links to