Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
HDDS-4946 Introduced a race condition in NodeStateManager#addNode that allows SCM's background pipeline creator or another thread to read a node with a lower MLV than SCM as healthy before it is moved to the healthy readonly state.
public void addNode(DatanodeDetails datanodeDetails, LayoutVersionProto layoutInfo) throws NodeAlreadyExistsException { NodeStatus newNodeStatus = newNodeStatus(datanodeDetails); nodeStateMap.addNode(datanodeDetails, newNodeStatus, layoutInfo); UUID dnID = datanodeDetails.getUuid(); try { updateLastKnownLayoutVersion(datanodeDetails, layoutInfo); DatanodeInfo dnInfo = nodeStateMap.getNodeInfo(dnID); NodeStatus status = nodeStateMap.getNodeStatus(dnID); // State machine starts nodes as HEALTHY. If there is a layout // mismatch, this node should be moved to HEALTHY_READONLY. updateNodeLayoutVersionState(dnInfo, layoutMisMatchCondition, status, NodeLifeCycleEvent.LAYOUT_MISMATCH); } catch (NodeNotFoundException ex) { LOG.error("Inconsistent NodeStateMap! Datanode with ID {} was " + "added but not found in map: {}", dnID, nodeStateMap); } eventPublisher.fireEvent(SCMEvents.NEW_NODE, datanodeDetails); }
The node is added to the node state map (where other threads can view it) before its layout version information is updated.
This manifests as an intermittent test failure in TestSCMNodeManager#testSCMLayoutOnRegister, which fails due to this condition after about 15-30 consecutive runs.
Attachments
Issue Links
- links to