Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Today KafkaController call getAllReplicasOnBroker on broker failure and new broker start up to get all the replicas that broker is holding (or suppose to hold). This function actually issue a read on each topic's partition znodes. With large number of topic/partitions this could seriously increase the latency of handling broker failure and new broker startup.
On the other hand, ControllerContext maintains a partitionReplicaAssignment cache, which is designed to keep the most updated partition replica assignment according to ZK. So instead of reading from ZK, we could just read from the local cache, given that partitionReplicaAssignment is guaranteed to be up-to-date.