Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6462 Phase II : Erasure Coding Offline Recovery & Read/Write Improvements
  3. HDDS-7462

EC: Fix Reconstruction Issue with StaleRecoveringContainerScrubbingService

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.4.0
    • None

    Description

      EC Reconstruction(with Write Chunk Operation) recreates open with replica Index 0 when StaleRecoveringContainerScrubbingService deletes the recovering container. Thus an invalid container with replica 0 is created. This could potentially cause SCM failure when container is reported with heartbeat & also partial reconstructed container when a new block is written simultaneously with recovering container being deleted.

      Marking the recovering container as unhealthy should fix the issue. Handling the failure to delete unhealthy container should fix the issue from Reconstruction Coordinater will cleanup the stale container. 

      Attachments

        Issue Links

          Activity

            People

              swamirishi Swaminathan Balachandran
              swamirishi Swaminathan Balachandran
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: