Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.2.0
-
None
Description
Currently the 'Stats' webUI function shows some details about the consumption from the ATLAS_HOOK Kafka topic where changes from Hive Metastore arrive.
By far the most important metric is not available though; the lag the atlas server consumer-group has in consuming Hive updates.
Monitoring the lag is very important as trust in Atlas is greatly undermined when changes are not reflected in Atlas within seconds. I have had numerous occasions where ATLAS_HOOK consumption was slowing down silently and atlas was behind tens of thousands (or 2 days) worth of messages.
There should be a new metric for the lag on the stats page to quickly identify a possible reason for slow Atlas updates