Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7823 SCM HA Phase 2
  3. HDDS-4599

Handle inflight delete/add actions in ReplicationManager properly.

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.1.0
    • None
    • SCM HA
    • None

    Description

      ReplicationManager maintains the in-flight replication and deletion in-memory, which is not replicated using Ratis. So, theoretically it’s possible that we might run into data loss issues and over replicated issues if we immediately start ReplicationManager after a failover.

      There is a quick fix for the potential data loss issue HDDS-4589, however we need a thorough solution for both in-flight add and in-flight delete.

      We have two proposals from sodonnell:

      1. have the DNs provide a list of pending_delete blocks in their container report / heartbeat, and then we can use that in SCM.
      2. if the DNs detect a new master SCM or a restarted SCM, then purge their pending delete list and wait for new instructions from the new/restarted SCM.

      File this Jira to record this problem.

      Attachments

        Activity

          People

            a493172422 YI-CHEN WANG
            glengeng Glen Geng
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: