Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5116

Secure datanode/OM may exit if cannot connect to SCM

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Intermittent failure in secure acceptance tests indicates that datanode may fail to start up if SCM is not yet ready:

      datanode_3  | STARTUP_MSG: Starting HddsDatanodeService
      ...
      datanode_3  | 2021-04-19 08:20:29,030 [main] INFO ozone.HddsDatanodeService: Creating csr for DN-> subject:dn@627dcb55b990
      ...
      datanode_3  | 2021-04-19 08:20:57,660 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From 627dcb55b990/172.26.0.4 to scm:9961 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy18.submitRequest over nodeId=scmNodeId,nodeAddress=scm/172.26.0.10:9961 after 14 failover attempts. Trying to failover after sleeping for 2000ms.
      datanode_3  | 2021-04-19 08:20:59,667 [main] ERROR ozone.HddsDatanodeService: Error while storing SCM signed certificate.
      ...
      datanode_3  | 	at org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.submitRequest(SCMSecurityProtocolClientSideTranslatorPB.java:104)
      datanode_3  | 	at org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.getDataNodeCertificateChain(SCMSecurityProtocolClientSideTranslatorPB.java:263)
      datanode_3  | 	at org.apache.hadoop.ozone.HddsDatanodeService.getSCMSignedCert(HddsDatanodeService.java:349)
      datanode_3  | 	at org.apache.hadoop.ozone.HddsDatanodeService.initializeCertificateClient(HddsDatanodeService.java:320)
      datanode_3  | 	at org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:248)
      datanode_3  | 	at org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:192)
      ...
      datanode_3  | SHUTDOWN_MSG: Shutting down HddsDatanodeService at 627dcb55b990/172.26.0.4
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bharat Bharat Viswanadham
            adoroszlai Attila Doroszlai
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment