Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-2823 SCM HA Support
  3. HDDS-5078

[SCM HA Security] NPE during secure SCM initialization with HA code updated to an already existing cluster

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0
    • Component/s: SCM HA, Security
    • Labels:
      None

      Description

      On a Cloudera Manager managed cluster, scm is started always with --init option specified, and this behaviour revealed the following null pointer dereference:
      StorageContainerManager#initializeCertificateClient initializes the scmCertificateClient only if scmStorageConfig#checkPrimarySCMIdInitialized() evaluates to true. This evaluates to true, if the VERSION file contains primaryScmNodeId with a value.

      If you upgrade an existing cluster with a single SCM to this code, the VERSION file does not contain a primaryScmNodeId, so the scmCertificateClient remains null.

      Later the initialization code calls the StorageContainerManager#initializeCAnSecurityProtocol method, which at the end creates the securityProtocolServer, for the constructor call the rootCACert is provided by calling the scmCertificateClient#getCACertificate method, but this is a null dereference as scmCertificateClient is null.

      The scmCertificateClient being null, can cause problems later as well, as it is used multiple times unconditionally.

      Later on after working around this particular problem (by simply let the code create the scmCertificateClient without conditions), it turned out that in the StorageContainerManager#initializeCAnSecurityProtocol call the scmCertificateServer and the rootCertificateServer instances are also remain uninitialized, with that causing problems when an scm client tries to get the root CA certificate from the SCM.
      For me this suggests that initialization of SCM fails after an upgrade on an old cluster, this was working fine before, and --init did not reinitialized anything, but worked fine.

      If I change Cloudera Manager behaviour to do not init the SCM when I start it, I still get the same NPE as with --init from the SCM.
      The exception I get in the SCM log is as follows, the command I issue is a recommission of a formerly (before upgrade) decommissioned DN.

      java.lang.NullPointerException
      	at org.apache.hadoop.hdds.protocol.proto.SCMSecurityProtocolProtos$SCMGetCertResponseProto$Builder.setX509RootCACertificate(SCMSecurityProtocolProtos.java:9026)
      	at org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.getCACertificate(SCMSecurityProtocolServerSideTranslatorPB.java:257)
      	at org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.processRequest(SCMSecurityProtocolServerSideTranslatorPB.java:104)
      	at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
      	at org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.submitRequest(SCMSecurityProtocolServerSideTranslatorPB.java:89)
      	at org.apache.hadoop.hdds.protocol.proto.SCMSecurityProtocolProtos$SCMSecurityProtocolService$2.callBlockingMethod(SCMSecurityProtocolProtos.java:10537)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
      	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:986)
      	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:914)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2887)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bharat Bharat Viswanadham
                Reporter:
                pifta István Fajth
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: