Description
When a cluster is upgraded from the 1.3 release to the new code, with SCM HA enabled, and gRPC TLS turned on, SCM's stall and can not create the Ratis ring when starting up with the new code.
It is happening because during the Ratis server setup, ReloadingX509(Key|Trust)Manager tries to create the CertPath object that is used in the new code to identify a role, and the cert path creation tries to reach the SCM leader to get the CA certificates in the system, which fails and is being retried indefinitely, with that SCM remains stuck in this retry loop, as it is still in the Ratis ring creation.
A possible workaround for the problem is to concatenate the rootCA certificate to the certificate.crt and the <certSerialID>.crt file in the certificate directory (<ozone.metadata.dirs>/scm/sub-CA/certs/ folder).
Attachments
Issue Links
- duplicates
-
HDDS-9420 [Compatibility]Enabling GRPC encryption causes SCM startup failure.
- Resolved
- is related to
-
HDDS-7379 Use certificate bundles instead of the sole certificate
- Resolved
-
HDDS-7486 Support KeyStoreFactory which supports keyManager and trustManager reload
- Resolved
- relates to
-
HDDS-9420 [Compatibility]Enabling GRPC encryption causes SCM startup failure.
- Resolved