Description
In MetadataVersion 3.7-IV2 and above, on the broker, AssignmentsManager sends an RPC to the controller informing it about which directory we have chosen to place a replica on. Unfortunately, the code does not check to see if the topic still exists in the MetadataImage before sending the RPC. It will also retry infinitely. Therefore, when a topic is created and deleted in rapid succession, we can get stuck retrying the AssignReplicasToDirsRequest forever.
In order to prevent this problem, the AssignmentsManager should check if a topic still exists (and is still present on the broker in question) before sending the RPC. In order to prevent log spam, we should not log any error messages until several minutes have gone past without success. Finally, rather than creating a new EventQueue event for each assignment request, we should simply modify a shared data structure and schedule a deferred event to send the accumulated RPCs. This will improve efficiency.
Attachments
Issue Links
- links to
This has been backported to 3.7 and 3.8 https://github.com/apache/kafka/commit/431c00d80241506ea34ea8a00f1b67034956b53d
https://github.com/apache/kafka/commit/0afab4b39317732d4d30db59f1edc560a99fde08
Updating the fix versions.