Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7537

Only include live brokers in the UpdateMetadataRequest sent to existing brokers if there is no change in the partition states

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2.0
    • Component/s: controller
    • Labels:
      None

      Description

      Currently if when brokers join/leave the cluster without any partition states changes, controller will send out UpdateMetadataRequests containing the states of all partitions to all brokers. But for existing brokers in the cluster, the metadata diff between controller and the broker should only be the "live_brokers" info. Only the brokers with empty metadata cache need the full UpdateMetadataRequest. Sending the full UpdateMetadataRequest to all brokers can place nonnegligible memory pressure on the controller side.

      Let's say in total we have N brokers, M partitions in the cluster and we want to add 1 brand new broker in the cluster. With RF=2, the memory footprint per partition in the UpdateMetadataRequest is ~200 Bytes. In the current controller implementation, if each of the N RequestSendThreads serializes and sends out the UpdateMetadataRequest at roughly the same time (which is very likely the case), we will end up using (N+1)*M*200B. In a large kafka cluster, we can have:

      N=99
      M=100k
      
      Memory usage to send out UpdateMetadataRequest to all brokers:
      100 * 100K * 200B = 2G
      
      However, we only need to send out full UpdateMetadataRequest to the newly added broker. We only need to include live broker ids (4B * 100 brokers) in the UpdateMetadataRequest sent to the existing 99 brokers. So the amount of data that is actully needed will be:
      1 * 100K * 200B + 99 * (100 * 4B) = ~21M
      
      
      We will can potentially reduce 2G / 21M = ~95x memory footprint as well as the data tranferred in the network.

       

      This issue kind of hurts the scalability of a kafka cluster. KIP-380 and KAFKA-7186 also help to further reduce the controller memory footprint.

       

      In terms of implementation, we can keep some in-memory state in the controller side to differentiate existing brokers and uninitialized brokers (e.g. brand new brokers) so that if there is no change in partition states, we only send out live brokers info to existing brokers.

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hzxa21 Zhanxiang (Patrick) Huang
                Reporter:
                hzxa21 Zhanxiang (Patrick) Huang
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: