Affects Version/s: None
Fix Version/s: 2.2.0
Currently a broker can process controller requests that are sent before the broker is restarted. This could cause a few problems. Here is one example:
Let's assume partitions p1 and p2 exists on broker1.
1) Controller generates LeaderAndIsrRequest with p1 to be sent to broker1.
2) Before controller sends the request, broker1 is quickly restarted.
3) The LeaderAndIsrRequest with p1 is delivered to broker1.
4) After processing the first LeaderAndIsrRequest, broker1 starts to checkpoint high watermark for all partitions that it owns. Thus it may overwrite high watermark checkpoint file with only the hw for partition p1. The hw for partition p2 is now lost, which could be a problem.
In general, the correctness of broker logic currently relies on a few assumption, e.g. the first LeaderAndIsrRequest received by broker should contain all partitions hosted by the broker, which could break if broker can receive controller requests that were generated before it restarts.
One reasonable solution to the problem is to include the expectedBrokeNodeZkVersion in the controller requests. Broker should remember the broker znode zkVersion after it registers itself in the zookeeper. Then broker can reject those controller requests whose expectedBrokeNodeZkVersion is different from its broker znode zkVersion.