Kafka
  1. Kafka
  2. KAFKA-921

Expose max lag mbean for consumers and replica fetchers

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None

      Description

      We have a ton of consumer mbeans with names that are derived from the consumer id, broker being fetched from, fetcher id, etc. This makes it difficult to do basic monitoring of consumer/replica fetcher lag - since the mbean to monitor can change. A more useful metric for monitoring purposes is the maximum lag across all fetchers.

      1. KAFKA-921-v1.patch
        5 kB
        Joel Koshy
      2. KAFKA-921-v2.patch
        3 kB
        Joel Koshy
      3. KAFKA-921-v3.patch
        5 kB
        Joel Koshy

        Activity

        Hide
        Joel Koshy added a comment -

        Thanks for the reviews. Committed with the minor change - i.e., Replica instead of Replica-<id>

        Show
        Joel Koshy added a comment - Thanks for the reviews. Committed with the minor change - i.e., Replica instead of Replica-<id>
        Hide
        Neha Narkhede added a comment -

        +1 on patch v3.

        Show
        Neha Narkhede added a comment - +1 on patch v3.
        Hide
        Jun Rao added a comment -

        Thanks for patch v3. +1. Just one minor comment.

        1. ReplicaFetcherManager: It seems that we can just use "Replica", instead of "Replica-" + brokerConfig.brokerId as the metric prefix, since the metric is local to the broker.

        Show
        Jun Rao added a comment - Thanks for patch v3. +1. Just one minor comment. 1. ReplicaFetcherManager: It seems that we can just use "Replica", instead of "Replica-" + brokerConfig.brokerId as the metric prefix, since the metric is local to the broker.
        Hide
        Joel Koshy added a comment -

        One caveat in this approach is that if a fetcher is wedged for any reason, then the reported lag is inaccurate since it depends on getting the high watermark from fetch responses. i.e., to check on the health of a consumer you would need to look at both the max lag and min fetch rate across all fetchers.

        Show
        Joel Koshy added a comment - One caveat in this approach is that if a fetcher is wedged for any reason, then the reported lag is inaccurate since it depends on getting the high watermark from fetch responses. i.e., to check on the health of a consumer you would need to look at both the max lag and min fetch rate across all fetchers.
        Hide
        Joel Koshy added a comment -

        Yes - I think that would be better. Moved it to AbstractFetcherManager. So depending on whether you are looking at replica fetchers or consumer fetchers, the MaxLag mbean will show up in ReplicaFetcherManager or ConsumerFetcherManager respectively.

        Show
        Joel Koshy added a comment - Yes - I think that would be better. Moved it to AbstractFetcherManager. So depending on whether you are looking at replica fetchers or consumer fetchers, the MaxLag mbean will show up in ReplicaFetcherManager or ConsumerFetcherManager respectively.
        Hide
        Jun Rao added a comment -

        Thanks for the patch. Could we add the max lag in one place in AbstractFetcherThread and AbstractFetcherManager? We can pass in the proper metrics name.

        Show
        Jun Rao added a comment - Thanks for the patch. Could we add the max lag in one place in AbstractFetcherThread and AbstractFetcherManager? We can pass in the proper metrics name.
        Hide
        Joel Koshy added a comment -

        This provides a max lag mbean for both consumer fetcher manager and replica fetcher manager; although I think it is more useful for monitoring consumers. For replica fetchers we need to closely monitor all replica fetchers anyway. i.e., the set of mbeans is static. I can reduce the scope to just consumers if others agree.

        Show
        Joel Koshy added a comment - This provides a max lag mbean for both consumer fetcher manager and replica fetcher manager; although I think it is more useful for monitoring consumers. For replica fetchers we need to closely monitor all replica fetchers anyway. i.e., the set of mbeans is static. I can reduce the scope to just consumers if others agree.

          People

          • Assignee:
            Unassigned
            Reporter:
            Joel Koshy
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development