Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
When a heartbeat is received by the Cluster Coordinator, it responds based on the node's current connection state. In the case of a disconnected node, it either notifies the node that it is disconnected so that it will stop hearting, or it requests the node to reconnect to the cluster.
Due to changes that were made in 1.16, as well as a few additional changes that have been made since, we can be much more lenient about when we ask the node to reconnect vs. disconnect. For example, if a node was disconnected due to not handling an update request, we previously needed to request that the node disconnect again. However, now we can ask the node to reconnect, as it may well be able to reconcile any differences and rejoin.
We even currently request that a node disconnect if receiving a heartbeat from a node whose last state was "Disconnected because Node was Shutdown". We should definitely be more lenient in this case, as it's occasionally causing System Test failures (e.g., https://github.com/apache/nifi/actions/runs/6498488206).
Attachments
Issue Links
- links to