Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-9254

Legacy replication manager uses mismatched replicas as replication sources

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • None

    Description

      Consider a case where SCM has a CLOSED container and the replica states are CLOSED, CLOSED, QUASI. In the pull replication model, RM will send all 3 of these replicas to the datanode to use as replication sources. The DN will do a random shuffle and pick one to replicate. If it chooses the QUASI-CLOSED replica, the next iteration of RM will see replicas CLOSED, CLOSED, QUASI, QUASI. RM will issue the same command since the CLOSED replicas are still under replicated, but now the odds of the DN's random shuffle choosing a quasi closed replica are increased. This process can repeat until the cluster is filled with a quasi-closed replica on each datanode. This can bring the cluster into the stuck state described in HDDS-8536.

      Attachments

        Issue Links

          Activity

            People

              erose Ethan Rose
              erose Ethan Rose
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: