Description
SCM crashes with the following exception when ReplicationManager is trying to re-replicate under replicated containers
2019-07-08 12:46:36 ERROR ReplicationManager:215 - Exception in Replication Monitor Thread. java.lang.IllegalArgumentException: Affinity node /default-rack/aab15e2d07cc is not a member of topology at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:767) at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.chooseRandom(NetworkTopologyImpl.java:407) at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseNode(SCMContainerPlacementRackAware.java:242) at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:168) at org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487) at org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293) at java.base/java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4698) at java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1083) at org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205) at java.base/java.lang.Thread.run(Thread.java:834) 2019-07-08 12:46:36 INFO ExitUtil:210 - Exiting with status 1: java.lang.IllegalArgumentException: Affinity node /default-rack/aab15e2d07cc is not a member of topology 2019-07-08 12:46:36 INFO StorageContainerManagerStarter:51 - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down StorageContainerManager at 8c763563f672/192.168.112.2 ************************************************************/
Attachments
Issue Links
- duplicates
-
HDDS-1713 ReplicationManager fail to find proper node topology based on Datanode details from heartbeat
- Resolved