We have a working solution now for rack aware assignment. It is based on current patch for this JIRA but with some improvement. The key idea of the solution is:
- Rack ID is a String instead of integer
- For replica assignment, add an extra parameter of Map[Int, String] to assignReplicasToBrokers() method which maps broker ID to rack ID
- Before doing the rack aware assignment, sort the broker list such that they are interlaced according to the rack. In other words, adjacent brokers should not be in the same rack if possible . For example, assuming 6 brokers mapping to 3 racks:
0 -> "rack1", 1 -> "rack1", 2 -> "rack2", 3 -> "rack2", 4 -> "rack3", 5 -> "rack3"
The sorted broker list could be (0, 2, 4, 1, 3, 5)
- Apply the same assignment algorithm to assign replicas, with the addition of skipping a broker if its rack is already used for the same partition (similar to what has been done in current patch)
The benefit of this approach is that replica distribution is kept as even as possible to all the racks and brokers.
With regard to KAFKA-1792, an easy solution is to restrict replica movement within the same rack, which I think should work in most practical cases. It will also have added benefit that usually replicas move faster within a rack. So basically we can apply the same algorithm described in KAFKA-1792 for each rack. For example, if there are three racks, then apply the algorithm three times, each time with broker list and assignment for that specific rack. Again, we assume the broker to rack mapping will be available in the method signature.
The open question is how to obtain broker to rack mapping. The information can be supplied when Kafka registers the broker with ZooKeeper which means some information has to be added to ZooKeeper. However, it could be that the rack information is already available in a deployment independent way. For example, for some deployment, the rack information may be available in a database. What we can do is to abstract out the API required to obtain rack information in an interface and allow user to supply an implementation in command line or at broker start up (to handle auto topic creation).