Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-2
Description
Currently, we have global metrics for event-processor in catalogd's web UI:
Name | Value | Description |
---|---|---|
events-processor.avg-events-fetch-duration | 989ms | Average time taken to fetch a batch of metastore events |
events-processor.avg-events-process-duration | 0 | Average time taken to process a batch of events received from metastore |
events-processor.events-received | 0 | Total number of metastore events received |
events-processor.events-received-15min-rate | 0.000000 | Exponentially weighted moving average (EWMA) of number of events received in last 15 min |
events-processor.events-received-1min-rate | 0.000000 | Exponentially weighted moving average (EWMA) of number of events received in last 1 min |
events-processor.events-received-5min-rate | 0.000000 | Exponentially weighted moving average (EWMA) of number of events received in last 5 min |
events-processor.events-skipped | 0 | Total number of metastore events skipped |
events-processor.last-synced-event-id | 734979 | Last metastore event id that the catalog server processed and synced to |
events-processor.status | ACTIVE | Metastore event processor status |
Some metrics can be added for table level, e.g. avg-events-process-duration (also extend it to measure the last 30min, 1h, 24h)
This helps users to find which tables are causing the event-processor lagging behind. So they can disable event-processor on them as a workaround.
This Jira also tracks a metric at events-processor level 'events-consuming-delay' to gauge how much is taken by events-processor to consume the generated events.