Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
Description
Intermittent failure in secure acceptance tests indicates that datanode may fail to start up if SCM is not yet ready:
datanode_3 | STARTUP_MSG: Starting HddsDatanodeService ... datanode_3 | 2021-04-19 08:20:29,030 [main] INFO ozone.HddsDatanodeService: Creating csr for DN-> subject:dn@627dcb55b990 ... datanode_3 | 2021-04-19 08:20:57,660 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From 627dcb55b990/172.26.0.4 to scm:9961 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy18.submitRequest over nodeId=scmNodeId,nodeAddress=scm/172.26.0.10:9961 after 14 failover attempts. Trying to failover after sleeping for 2000ms. datanode_3 | 2021-04-19 08:20:59,667 [main] ERROR ozone.HddsDatanodeService: Error while storing SCM signed certificate. ... datanode_3 | at org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.submitRequest(SCMSecurityProtocolClientSideTranslatorPB.java:104) datanode_3 | at org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.getDataNodeCertificateChain(SCMSecurityProtocolClientSideTranslatorPB.java:263) datanode_3 | at org.apache.hadoop.ozone.HddsDatanodeService.getSCMSignedCert(HddsDatanodeService.java:349) datanode_3 | at org.apache.hadoop.ozone.HddsDatanodeService.initializeCertificateClient(HddsDatanodeService.java:320) datanode_3 | at org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:248) datanode_3 | at org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:192) ... datanode_3 | SHUTDOWN_MSG: Shutting down HddsDatanodeService at 627dcb55b990/172.26.0.4
Attachments
Issue Links
- links to