Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-2823 SCM HA Support
  3. HDDS-4740

Admin command should take effect on all SCM instance

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: SCM HA

      Description

      Scope

      admin command includes rm start/stop and safe mode exit.

       

      Requirement

      1, When admin stops rm, rm in all SCM should stop, re-election should not trigger rm to start in the new leader.

      2, When admin starts rm, only rm in leader and out of safe mode should take effect. Given leader is in safe mode, even if admin starts rm explicitly, it does not take effect.

      3, This admin rm start/stop can not survive restart for a SCM instance. When admin decides to stop rm of the SCM cluster, he should pay attention if any of the SCM crashes.

       

      Status

      1, For now, admin rm start/stop will create/destroy the rm thread.

      2, SCMContainerLocationFailoverProxyProvider has been proxied by FailoverProxyProvider, it will round robin SCMs in ozone.scm.names, until it is successfully handled. In ServerSide, whenever receiving a client request, it do isLeader check first, return nle to trigger fpp to failover to the next SCM.

      3, SCMService decides the next iteration of rm to take effect or not by changing RUNNING and PAUSING.

       

      Solution:

      When receiving a rm stop/start request on the server side, SCM skip the isLeader check, just destroys/creates rm thread, client side fake an exception to trigger fpp to try the next SCM in a round robin way.

      The Running and PAUSING status and rm start/stop can be treated separately. The admin operations and the raft status are requirements of two dimensions.

       

      We can achieve above requirements:

      1, When admin stops rm, rm in all SCM should stop, re-election should not trigger rm to start in the new leader.

      Meet, admin rm start destroy rm thread in all SCM.

       

      2, When admin starts rm, only rm in leader and out of safe mode should take effect. Given leader is in safe mode, even if admin starts it explicitly, rm does not take effect.

      Meet, admin rm stop create rm thread in all SCM, but SCMStatus is decided by leader and safe mode.

       

      3, This admin rm start/stop can not survive restart for a SCM instance. When admin decides to stop rm of the SCM cluster, he should pay attention if any of the SCM crashes.

      Meet. The is actually a relax item. 

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                glengeng Glen Geng
                Reporter:
                glengeng Glen Geng
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: