Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-1595

Handling IO Failures on the Datanode

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Done
    • None
    • None
    • Ozone Datanode
    • None

    Description

      This Jira covers all the changes required to handle IO Failures on the Datanode. Handling an IO failure on the Datanode involves detecting failures as they happen and propagating the failure to the appropriate component in the system - possibly the Client and/or the SCM based on the type of failure.

      At a high-level, IO Failure handling has the following goals:
      1. Prevent Inconsistencies and corruption - due to non-handling or mishandling of failures.
      2. Prevent any data loss - timely detection of failure and propagate correct error back to the initiator instead of silently dropping the data while the client assumes the operation is committed.
      3. Contain the disruption in the system - if a disk volume fails on a DN, operations to the other nodes and volumes should not be affected.

      Details pertaining to design and changes required are covered in the attached pdf document.
      A sequence diagram used to analyse the Datanode IO Path is also attached, in svg format.

      Attachments

        1. Handling IO Failures on the Datanode.pdf
          371 kB
          Supratim Deka
        2. Raft IO v2.svg
          88 kB
          Supratim Deka

        Issue Links

          Activity

            People

              sdeka Supratim Deka
              sdeka Supratim Deka
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4.5h
                  4.5h