Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5120

Several controller metrics block if controller lock is held by another thread

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.10.2.0
    • None
    • controller, metrics
    • None

    Description

      We have been tracking latency issues surrounding queries to Controller MBeans. Upon digging into the root causes, we discovered that several metrics acquire the controller lock within the gauge.

      The affected metrics are:

      • ActiveControllerCount
      • OfflinePartitionsCount
      • PreferredReplicaImbalanceCount

      If the controller is currently holding the lock and a MBean request is received, the thread executing the request will block until the controller releases the lock.

      We discovered this in a cluster where the controller was holding the lock for extended periods of time for normal operations. We have documented this issue in KAFKA-5116.

      Several possible solutions exist:

      • Remove the lock from inside these Gauge s.
      • Store and update the metric values in AtomicLong s.

      Modifying the ActiveControllerCount metric seems to be straight-forward while the other 2 metrics seem to be more involved.

      We're happy to contribute a patch, but wanted to discuss potential solutions and their tradeoffs before proceeding.

      Attachments

        Activity

          People

            Unassigned Unassigned
            halorgium Tim Carey-Smith
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: