Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-2823

SCM HA Support

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: SCM HA
    • Target Version/s:

      Description

      OM HA is close to feature complete now. It's time to support SCM HA, to make sure there is no SPoF in the system.

       

      Design doc: https://docs.google.com/document/d/1vr_z6mQgtS1dtI0nANoJlzvF1oLV-AtnNJnxAgg69rM/edit?usp=sharing

        Attachments

          Issue Links

          1.
          Support SCM HA in MiniOzoneHACluster Sub-task Open Rui Wang
          2.
          PipelineStateManagerV2Impl#removePipeline will remove pipeline from db in case of failure Sub-task Open Unassigned
          3.
          Backport updates from ContainerManager(V1) Sub-task Open Unassigned
          4.
          Backport updates from PipelineManager(V1) Sub-task Open Unassigned
          5.
          Use suggestedLeader for SCM failover proxy performing failover Sub-task Open Unassigned
          6.
          Add unit test for SCMHAInvocationHandler Sub-task Open Nanda kumar
          7.
          Handle pipeline reports Sub-task Open Unassigned
          8.
          Handle ContainerAction and CloseContainer Sub-task Open Unassigned
          9.
          Arrange Util classes for SCM HA Sub-task Open Nanda kumar
          10.
          SCM CLI command towards certain IP Sub-task Open Unassigned
          11.
          Update javadoc in SCMHA related classes Sub-task Open Nanda kumar
          12.
          Revisit SCM client retry and failover when SCM leader changes Sub-task Open Shashikant Banerjee
          13.
          Design for SCM HA configuration Sub-task Open Unassigned
          14.
          Provide docker-compose for SCM HA Sub-task Open Unassigned
          15.
          Refactor out Ratis logic chain Sub-task Open Unassigned
          16.
          SafeMode exit rule for all SCMs Sub-task Open Unassigned
          17.
          Decommission can be only executed on leader Sub-task Open Rui Wang
          18.
          CLI for SCMs info Sub-task Open Unassigned
          19.
          Design for Error/Exception handling in state update for container/pipeline V2 Sub-task Open Glen Geng
          20.
          Add unit test for container operation in ContainerManagerImpl Sub-task Open Nanda kumar
          21.
          replace scmID with clusterID for container and volume at Datanode side Sub-task Open Glen Geng
          22.
          In ContainerStateManagerV2, modification of RocksDB should be consistent with that of memory state. Sub-task Open Glen Geng
          23.
          Fix Recon after HDDS-4133 Sub-task Patch Available Nanda kumar
          24.
          TestSCMStateMachine Sub-task Open Unassigned
          25.
          SCMBlockLocationFailoverProxyProvider should handle LeaderNotReadyException Sub-task Open Rui Wang
          26.
          Testing Infrastructure Random Failures Sub-task Open Unassigned
          27.
          SCM HA needs handle the generation of clusterID and scmUuid in a decent way. Sub-task Open Unassigned
          28.
          Add unit test to prove that datanode can handle term in SCMCommand properly Sub-task Open Unassigned
          29.
          FailoverProxyProvider of SCM client should support leaderHint. Sub-task Open Rui Wang
          30.
          Handle inflight delete/add actions in ReplicationManager properly. Sub-task Open YI-CHEN WANG
          31.
          Handle backward compatible when upgrading from non HA to HA Sub-task Open Rui Wang
          32.
          Implement InstallSnapshot for SCM HA Sub-task Open Shashikant Banerjee
          33.
          Disallow committing to DB by getCurrentBatchOperation() Sub-task Open Unassigned
          34.
          Add ratis snapshot retention policy for SCM HA Sub-task Open Shashikant Banerjee
          35.
          Better handle the case that setting a trx that is earlier than latest trx in SCMDBTransactionBuffer Sub-task Open Rui Wang
          36.
          Temporarily ignore failing Recon tests Sub-task Open Nanda kumar
          37.
          Use OM style config to construct RaftGroup and initialize Raft Servers Sub-task Open Rui Wang
          38.
          Merge SCM HA Configuration Sub-task Open Bharat Viswanadham
          39.
          Handle NotLeaderException with Event Queue Handlers Sub-task Open Unassigned
          40.
          Retry policy for SCM requests over ratis Sub-task Open Shashikant Banerjee
          41.
          Add integration test for SequenceIdGen Sub-task Open Unassigned
          42.
          Add SCM to Ratis Log Parser Sub-task Open Mukul Kumar Singh
          43.
          Adapt admincli tests for SCM HA Sub-task Open Attila Doroszlai
          44.
          [SCM HA Security] generate certserialID in distributed sequence Sub-task Open Unassigned
          45.
          During bootstrap, always download checkpoint from leader SCM. Sub-task Open Unassigned
          46.
          [SCM HA Security] Make upgraded cluster to ratis enabled single node cluster Sub-task Open Bharat Viswanadham
          47.
          Merge SCM HA configs to ScmConfigKeys Sub-task Open Unassigned
          48.
          [SCM HA Security] Make InterSCM grpc channel secure Sub-task Open Bharat Viswanadham
          49.
          [SCM HA Security] Remove code of not starting ozone services when Security is enabled on SCM HA cluster Sub-task Open Bharat Viswanadham
          50.
          NPE during secure SCM initialization with HA code updated to an already existing cluster Sub-task Open Bharat Viswanadham
          51.
          Ensure failover to suggested leader if any for NotLeaderException Sub-task Open Shashikant Banerjee

            Activity

              People

              • Assignee:
                licheng Li Cheng
                Reporter:
                Sammi Sammi Chen
              • Votes:
                0 Vote for this issue
                Watchers:
                27 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m