Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-8699 Further Replication Manager Improvements
  3. HDDS-9125

Decommissioning blocked because of under replicated EC containers

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.0
    • 1.4.0
    • ECOfflineRecovery, SCM
    • None

    Description

      The situation is well documented under the heading 'Situation 4' in https://docs.google.com/document/d/1ebuSwJZkw4wMWWCHinDvRCfNbeFD4kcHMyIN6Q6wD9g/edit?usp=sharing. This happens because of limitations in rack scatter policy + replication manager flow. One possible solution is implementing "fallback" in the rack scatter policy. Along with the doc, this PR is also related - https://github.com/apache/ozone/pull/5097.

      An example (summary) of this situation:
      Suppose there are 5 racks and 6 DNs, such that any one rack will have 2 DNs. 5 replicas of an EC container are scattered across each of the 5 racks (so that there's only 1 replica on each rack). Now, if any of the Datanodes from any rack where there's only 1 DN on that rack is decommissioned, under replication handling will be blocked.

      Attachments

        Issue Links

          Activity

            People

              sodonnell Stephen O'Donnell
              siddhant Siddhant Sangwan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: