Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3698

Ozone Non-Rolling upgrades

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None

    Description

      Support for Non-rolling upgrades in Ozone.

      Attachments

        1. OM Prepare Upgrade.pdf
          143 kB
          Aravindan Vijayan
        2. Ozone Non-Rolling Upgrades.pdf
          228 kB
          Aravindan Vijayan
        3. Ozone Non-Rolling Upgrades (Presentation).pdf
          115 kB
          Aravindan Vijayan
        4. Ozone Non-Rolling Upgrades Doc v1.1.pdf
          325 kB
          Aravindan Vijayan
        5. Ozone Non-Rolling Upgrades Doc v1.2 (Implemented Design).pdf
          504 kB
          Aravindan Vijayan

        Issue Links

        1.
        Introduce Layout Feature interface in Ozone Sub-task Resolved Aravindan Vijayan Actions
        2.
        Introduce OM layout version 'v0'. Sub-task Resolved Stephen O'Donnell Actions
        3.
        Introduce SCM layout version 'v0'. Sub-task Resolved Stephen O'Donnell Actions
        4.
        Add current layout version to OM Ratis Request Sub-task Resolved Aravindan Vijayan Actions
        5.
        Implement Finalize command in Ozone Manager client. Sub-task Resolved István Fajth Actions
        6.
        Expose upgrade related state through JMX Sub-task Resolved Ethan Rose Actions
        7.
        Implement a factory for OM Requests that returns an instance based on layout version. Sub-task Resolved Aravindan Vijayan Actions
        8.
        Implement Finalize command in Ozone Manager server. Sub-task Resolved István Fajth Actions
        9.
        Implement HDDS Version management using the LayoutVersionManager interface. Sub-task Resolved Prashant Pogde Actions
        10.
        Add current HDDS layout version to Datanode heartbeat and registration. Sub-task Resolved Prashant Pogde Actions
        11.
        Implement Datanode Finalization Sub-task Resolved Prashant Pogde Actions
        12.
        SCM Finalize client command implementation. Sub-task Resolved Prashant Pogde Actions
        13.
        Implement post-finalize SCM logic to allow nodes of only new version to participate in pipelines. Sub-task Resolved Prashant Pogde Actions
        14.
        Schema Version field in Container metadata file should be backward compatible during read/write. Sub-task Resolved Ethan Rose Actions
        15.
        Add acceptance tests for upgrade, finalization and downgrade Sub-task Resolved Ethan Rose Actions
        16.
        Onboard HDDS-3869 into Layout version management Sub-task Resolved Ethan Rose Actions
        17.
        Revisit 'static' nature of OM Layout Version Manager. Sub-task Resolved Aravindan Vijayan Actions
        18.
        Implement a "prepareForUpgrade" step that applies all committed transactions onto the OM state machine. Sub-task Resolved Aravindan Vijayan Actions
        19.
        Add the current layout versions to DN - SCM proto payload. Sub-task Resolved Prashant Pogde Actions
        20.
        SCM changes to process Layout Info in register request/response Sub-task Resolved Prashant Pogde Actions
        21.
        Prepare for Upgrade step should purge the log after waiting for the last txn to be applied. Sub-task Resolved Aravindan Vijayan Actions
        22.
        SCM changes to process Layout Info in heartbeat request/response Sub-task Resolved Prashant Pogde Actions
        23.
        OM Layout Version Manager init throws silent CNF error in integration tests. Sub-task Resolved Aravindan Vijayan Actions
        24.
        Investigate Acceptance test failure in Ozone Upgrade branch. Sub-task Resolved Ethan Rose Actions
        25.
        Add DataNode state and transitions for a node going through upgrade Sub-task Resolved Prashant Pogde Actions
        26.
        Fix compilation issue in HDDS-3698-upgrade branch. Sub-task Resolved Aravindan Vijayan Actions
        27.
        Verify that OM/SCM start fails when Software Layout Version < Metadata Layout Version Sub-task Resolved Ethan Rose Actions
        28.
        Ozone Manager Prepare for Upgrade/Downgrade design Sub-task Resolved Aravindan Vijayan Actions
        29.
        Implement OM Prepare Request/Response Sub-task Resolved Ethan Rose Actions
        30.
        SCM restarts in the middle of the Upgrade should grace fully complete Upgrade Sub-task Resolved Prashant Pogde Actions
        31.
        Add more unit tests for OM layout version manager. Sub-task Resolved Aravindan Vijayan Actions
        32.
        Add a new OM admin command to submit the OMPrepareRequest. Sub-task Resolved Aravindan Vijayan Actions
        33.
        Prepare client should check every OM individually for the prepared check based on Txn Id. Sub-task Resolved Aravindan Vijayan Actions
        34.
        Add pre append gate and marker file to OM prepare state Sub-task Resolved Ethan Rose Actions
        35.
        Merge master into HDDS-3698-upgrade branch. Sub-task Resolved Prashant Pogde Actions
        36.
        Fix issues in 'prepare' operation with one OM down. Sub-task Resolved Aravindan Vijayan Actions
        37.
        Add an admin command to cancel "preparation" of an OM quorum. Sub-task Resolved Ethan Rose Actions
        38.
        Create OMCancelPrepareRequest and Response to cancel the prepared state of an OM. Sub-task Resolved Ethan Rose Actions
        39.
        Add Integration test for HDDS upgrade (happy path cases) Sub-task Resolved Prashant Pogde Actions
        40.
        Starting OM with the --upgrade flag should delete the prepare marker file. Sub-task Resolved Ethan Rose Actions
        41.
        Revisit LayoutFeature, and UpgradeAction related code Sub-task Resolved Aravindan Vijayan Actions
        42.
        Fresh deploy of Ozone must use the highest layout version by default Sub-task Resolved Aravindan Vijayan Actions
        43.
        Add read only command to get status of Finalization in OM & SCM. Sub-task Resolved Mark Gui Actions
        44.
        Datanode unable to prepare itself for finalize. Sub-task Resolved Prashant Pogde Actions
        45.
        SCM should go into "safe mode" until there is at least 1 pipeline to work with after finalization. Sub-task Resolved Ethan Rose Actions
        46.
        Attempting an SCM finalization after a failed / incomplete finalization. Sub-task Resolved Prashant Pogde Actions
        47.
        Fix upgrade branch CI stability issues. Sub-task Resolved Ethan Rose Actions
        48.
        Add Layout version information to Recon datanode info API. Sub-task Resolved Aravindan Vijayan Actions
        49.
        Layout version should be available in DB for an un-finalized OM to be finalized through a Ratis snapshot. Sub-task Resolved Aravindan Vijayan Actions
        50.
        Validating HDDS upgrade in presence of failures Sub-task Resolved Prashant Pogde Actions
        51.
        Onboard SCM HA as a new Layout Feature into upgrades. Sub-task Resolved Aravindan Vijayan Actions
        52.
        Do not wait one heartbeat to move newly registered datanodes that match SCM's MLV from HEALTHY_READONLY to HEALTHY Sub-task Resolved Ethan Rose Actions
        53.
        NoSuchMethodException when wrapping RpcException on downgrade Sub-task Resolved Keyi Song Actions
        54.
        Introduce First upgrade startup action and Pre-finalized state validation in Layout Feature. Sub-task Resolved Aravindan Vijayan Actions
        55.
        SCM should not use pipelines with HEALTHY_READONLY datanodes Sub-task Resolved Ethan Rose Actions
        56.
        Upload upgrade design documentation to docs module. Sub-task Resolved Aravindan Vijayan Actions
        57.
        Merge master with SCM HA changes into upgrade branch. Sub-task Resolved Aravindan Vijayan Actions
        58.
        Add pre-finalize validation action for SCM HA Sub-task Resolved Ethan Rose Actions
        59.
        Attempt to remove state from *UpgradeFinalizer classes. Sub-task Resolved Aravindan Vijayan Actions
        60.
        Track OM prepare intermittent integration test failure Sub-task Resolved Ethan Rose Actions
        61.
        Recover from failure during upgrade action Sub-task Resolved Ethan Rose Actions
        62.
        Adjust LICENSE and NOTICE files for the non-rolling upgrade branch Sub-task Resolved Mark Gui Actions
        63.
        Upgrade related RPC calls should be allowed only for admins Sub-task Resolved Ethan Rose Actions
        64.
        Restructure the acceptance test groups (unsecure/secure/misc) Sub-task Resolved Mark Gui Actions
        65.
        Race condition in NodestateManager#addNode allows datanodes with lower MLV to be used in pipelines Sub-task Resolved Ethan Rose Actions
        66.
        Merge master into HDDS-3698-upgrade branch (04/30/21). Sub-task Resolved Ethan Rose Actions
        67.
        Do not fail SCM HA pre-finalize validation if SCM HA was already being used Sub-task Resolved Ethan Rose Actions
        68.
        Allow multiple OM request versions to be supported at same layout version (HDDS-2939). Sub-task Resolved Aravindan Vijayan Actions
        69.
        Datanodes should always use MLV 0 when no VERSION file is present Sub-task Resolved Ethan Rose Actions
        70.
        Merge master branch at 12e2918 into upgrade branch Sub-task Resolved Ethan Rose Actions
        71.
        Remove getRequestType method from OM request classes. Sub-task Resolved Aravindan Vijayan Actions
        72.
        Fix datanode capacity related race condition Sub-task Resolved Ethan Rose Actions
        73.
        Fix TestSCMNodeManager after merge of master at 1d8f972 into upgrade branch Sub-task Resolved Ethan Rose Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            avijayan Aravindan Vijayan Assign to me
            avijayan Aravindan Vijayan
            Votes:
            1 Vote for this issue
            Watchers:
            20 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment