Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.4.1
-
None
Description
I found there any container replication error thrown in ReplicationManager can terminates SCM service. It's a very expensive behavior to terminate the SCM service just because of one container replication error.
It's not worth to shutdown the SCM. We can be friendly to deal with this, catch the exception and print the warn message with thrown exception.
The shutdown info:
2020-01-30 08:16:04,705 ERROR org.apache.hadoop.hdds.scm.container.ReplicationManager: Exception in Replication Monitor Thread. java.lang.IllegalArgumentException: Affinity node /dc1/rack1/b9343ca0-a4bc-4436-9671-bc1de6c8bd89 is not a member of topology at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:789) at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.chooseRandom(NetworkTopologyImpl.java:399) at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseNode(SCMContainerPlacementRackAware.java:249) at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:173) at org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:515) at org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:311) at java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) at org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:223) at java.lang.Thread.run(Thread.java:745) 2020-01-30 08:16:04,730 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.lang.IllegalArgumentException: Affinity node /dc1/rack1/b9343ca0-a4bc-4436-9671-bc1de6c8bd89 is not a member of topology 2020-01-30 08:16:04,734 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SHUTDOWN_MSG:
Attachments
Issue Links
- links to