Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-7
Description
ClusterMembershipMgr::UpdateMembership will remove a node from the blacklist (if it is on the blacklist) if the method receives an update from the Statestore about the node. Currently, the Statestore should only send an update about the node if the node starts quiescing. If a node starts quiescing, it should be removed from the blacklist since it quiescing nodes aren't part of any executor groups anyway (no queries should be scheduled on them).
After running some experiments locally, it seems there are some other cases where the Statestore sends the ClusterMembershipMgr an update about a node even if it's quiescing state has not changed. Unfortunately, I haven't been able to fully track down what is triggering this, so far it only happens on cluster start up.
The ClusterMembershipMgr should only un-blacklist a node if that node is quiescing, currently it un-blacklists a node on any update to the node.