Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-9410

Upgrade stall from 1.3 to current code when gRPC TLS is enabled.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • 1.4.0
    • None
    • Security
    • None

    Description

      When a cluster is upgraded from the 1.3 release to the new code, with SCM HA enabled, and gRPC TLS turned on, SCM's stall and can not create the Ratis ring when starting up with the new code.

      It is happening because during the Ratis server setup, ReloadingX509(Key|Trust)Manager tries to create the CertPath object that is used in the new code to identify a role, and the cert path creation tries to reach the SCM leader to get the CA certificates in the system, which fails and is being retried indefinitely, with that SCM remains stuck in this retry loop, as it is still in the Ratis ring creation.

      A possible workaround for the problem is to concatenate the rootCA certificate to the certificate.crt and the <certSerialID>.crt file in the certificate directory (<ozone.metadata.dirs>/scm/sub-CA/certs/ folder).

      Attachments

        Issue Links

          Activity

            People

              pifta István Fajth
              pifta István Fajth
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: