Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-795

DFS Write pipeline does not detect defective datanode correctly in some cases (HADOOP-3339)

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.20.2
    • Component/s: hdfs-client
    • Labels:
      None

      Description

      HDFS write pipeline does not select the correct datanode in some error cases. One example : say DN2 is the second datanode and write to it times out since it is in a bad state.. pipeline actually removes the first datanode. If such a datanode happens to be the last one in the pipeline, write is aborted completely with a hard error.

      Essentially the error occurs when writing to a downstream datanode fails rather than reading. This bug was actually fixed in 0.18 (HADOOP-3339). But HADOOP-1700 essentially reverted it. I am not sure why.

      It is absolutely essential for HDFS to handle failures on subset of datanodes in a pipeline. We should not have at least known bugs that lead to hard failures.

      I will attach patch for a hack that illustrates this problem. Still thinking of how an automated test would look like for this one.

      My preferred target for this fix is 0.20.1.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rangadi Raghu Angadi
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: