Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-17061

KafkaController takes long time to connect to newly added broker after registration on large cluster

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.9.0
    • None
    • None

    Description

      Environment

      • Kafka version: 3.3.2
      • Cluster: 200~ brokers
      • Total num partitions: 40k
      • ZK-based cluster

      Phenomenon

      When a broker left the cluster once due to the long STW and came back after a while, the controller took 6 seconds until connecting to the broker after znode registration, it caused significant message delivery delay.

      [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, deleted brokers: , bounced brokers: , all live brokers: 1,... (kafka.controller.KafkaController)
      [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
      [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting (kafka.controller.RequestSendThread)
      [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback for 2 (kafka.controller.KafkaController)
      [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change requests (kafka.controller.RequestSendThread)
      

      Analysis

      From the flamegraph at that time, we can see that liveBrokerIds called by `isReplicaOnline` takes significant time in `addUpdateMetadataRequestForBrokers` invocation on broker startup.

      Attachments

        1. screenshot-flame-patched.png
          931 kB
          Haruki Okada
        2. screenshot-flame.png
          1.49 MB
          Haruki Okada
        3. image-2024-07-02-17-24-11-861.png
          1.50 MB
          Haruki Okada
        4. image-2024-07-02-17-22-06-100.png
          1.51 MB
          Haruki Okada
        5. flame-patched.html
          156 kB
          Haruki Okada
        6. flame.html
          138 kB
          Haruki Okada

        Issue Links

          Activity

            People

              ocadaruma Haruki Okada
              ocadaruma Haruki Okada
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: