Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-1880

Decommissioning and maintenance mode in Ozone

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • SCM

    Description

      This is the umbrella jira for decommissioning support in Ozone. Design doc will be attached soon.

      Attachments

        1.
        Design doc: decommissioning in Ozone Sub-task Resolved Marton Elek

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 43h
        Actions
        2.
        Extend SCMNodeManager to support decommission and maintenance states Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 7h
        Actions
        3.
        Add CLI Commands and Protobuf messages to trigger decom states Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1.5h
        Actions
        4.
        Extend SCMCLI Topology command to print node Operational States Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        5.
        Destroy pipelines on any decommission or maintenance nodes Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        6.
        QueryNode does not respect null values for opState or state Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        7.
        ContainerReplica should contain DatanodeInfo rather than DatanodeDetails Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        8.
        Refactor ReplicationManager to consider maintenance states Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        9.
        DatanodeAdminMonitor should track under replicated containers and complete the admin workflow accordingly Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        10.
        DeadNodeHandler should not remove replica for a dead maintenance node Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        11.
        Investigate why TestDatanodeAdminMonitor.testMonitoredNodeHasPipelinesClosed() fails Sub-task Resolved Stephen O'Donnell   Actions
        12.
        Have NodeManager.getNodeStatus throw NodeNotFoundException Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        13.
        Remove methods of internal representation from DatanodeAdminMontor interface Sub-task Resolved Marton Elek

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        14.
        Add Datanode command to allow the datanode to persist its admin state Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        15.
        Allow SCM webUI to show decommision and maintenance nodes Sub-task Resolved Unassigned   Actions
        16.
        Consider allowing maintenance end time to be specified in human readable format Sub-task Open pratap chandu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 10m
        Actions
        17.
        Update JMX metrics for node count in SCMNodeMetrics for Decommission and Maintenance Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 20m
        Actions
        18.
        Allow users to pass hostnames or IP when decommissioning nodes Sub-task Open Unassigned   Actions
        19.
        Expose decommission / maintenance metrics via JMX Sub-task Resolved Neil Joshi   Actions
        20.
        Merge MockNodeManager and SimpleMockNodeManager Sub-task Resolved Stephen O'Donnell   Actions
        21.
        Cluster disk space metrics should reflect decommission and maintenance states Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 10m
        Actions
        22.
        Add some unit tests around the changes in HDDS-2592 Sub-task Resolved Unassigned   Actions
        23.
        Consider using INFINITY in decommission and maintenance commands where not time is specified Sub-task Resolved Unassigned   Actions
        24.
        Merge Master branch into decom branch Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 10m
        Actions
        25.
        Investigate failure of TestDecommissionAndMaintenance integration test Sub-task Resolved Stephen O'Donnell   Actions
        26.
        Change replication logic to use PersistedOpState Sub-task Resolved Stephen O'Donnell

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 0.5h
        Actions
        27.
        Remove no longer needed class DatanodeAdminNodeDetails Sub-task Resolved Stephen O'Donnell   Actions
        28.
        Add integration tests for Decommission and resolve issues detected by the tests Sub-task Resolved Stephen O'Donnell   Actions
        29.
        Add integration tests for putting nodes into maintenance and fix any issues uncovered in the tests Sub-task Resolved Stephen O'Donnell   Actions
        30.
        DatanodeAdminMonitor no longers needs maintenance end time to be passed Sub-task Resolved Stephen O'Donnell   Actions
        31.
        Add Operational State to the datanode list command Sub-task Resolved Stephen O'Donnell   Actions
        32.
        Show Datanode OperationalState (IN_SERVICE/DECOMMISSION/MAINTENANCE) in Recon Sub-task Resolved Siyao Meng   Actions
        33.
        SCM can incorrectly marks Datanode as DECOMMISSIONING when Datanode is not fully initialized Sub-task Closed Unassigned   Actions
        34.
        Update NodeStatus OperationalState for Datanodes in Recon Sub-task Resolved Siyao Meng   Actions
        35.
        Add line break when node has no pipelines for `ozone admin datanode list` command Sub-task Resolved Siyao Meng   Actions
        36.
        Improve Ozone admin shell decommission/recommission/maintenance commands user experience Sub-task Resolved Siyao Meng   Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sodonnell Stephen O'Donnell
            elek Marton Elek
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 56h 10m
                56h 10m

                Slack

                  Issue deployment