Hadoop Common
  1. Hadoop Common
  2. HADOOP-3339

DFS Write pipeline does not detect defective datanode correctly if it times out.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Improved failure handling of last Data Node in write pipeline.
      Show
      Improved failure handling of last Data Node in write pipeline.

      Description

      When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt.

      I will attach example logs of such cases. I think this should be fixed in 0.17.x.

      1. tmp-3339-dn.patch
        1 kB
        Raghu Angadi
      2. HADOOP-3339.patch
        3 kB
        Raghu Angadi

        Issue Links

          Activity

          Raghu Angadi created issue -
          Raghu Angadi made changes -
          Field Original Value New Value
          Description
          When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt.

          I will attach example logs of such cases. I think this should be fixed in 0.17.x.
          When DFSClient is writing to DFS, it does not correctly detect the culprit datanode (rather datanodes do not inform) properly if the bad node times out. Say, the last datanode in in 3 node pipeline is is too slow or defective. In this case, pipeline removes the first two datanodes in first two attempts. The third attempt has only the 3rd datanode in the pipeline and it will fail too. If the pipeline detects the bad 3rd node when the first failure occurs, the write will succeed in the second attempt.

          I will attach example logs of such cases. I think this should be fixed in 0.17.x.
          Hide
          Raghu Angadi added a comment -

          TestDatanodeDeath tests killing different datanodes in the pipeline and it works. The main difference is that whether the downstream datanode's error is detected depends on which of the two threads (the main data receiver or the "PacketResonder") detects it.

          In TestDatanodeDeath, its always the PacketResponder that detects it. But when a downstream datanode timeouts (or when the connection is busy) its the main IO thread that detects it. The fix I am thinking of is to make main thread inform the 'PacketResponder' about the failure.

          Show
          Raghu Angadi added a comment - TestDatanodeDeath tests killing different datanodes in the pipeline and it works. The main difference is that whether the downstream datanode's error is detected depends on which of the two threads (the main data receiver or the "PacketResonder") detects it. In TestDatanodeDeath, its always the PacketResponder that detects it. But when a downstream datanode timeouts (or when the connection is busy) its the main IO thread that detects it. The fix I am thinking of is to make main thread inform the 'PacketResponder' about the failure.
          Hide
          Nigel Daley added a comment -

          Assigning to 0.18. This isn't a blocker for 0.17

          Show
          Nigel Daley added a comment - Assigning to 0.18. This isn't a blocker for 0.17
          Nigel Daley made changes -
          Fix Version/s 0.18.0 [ 12312972 ]
          Hide
          dhruba borthakur added a comment -

          +1 on Raghu's proposal.

          Show
          dhruba borthakur added a comment - +1 on Raghu's proposal.
          Hide
          Raghu Angadi added a comment -

          The attached patch fixes the main problem described (practically all the time). It informs upstream properly about the the down stream failure.

          Similar problem exists on client side as well. So if 2nd datanode timesout, most of the time client removes the first datanode as the bad one. The issues on DataNode and Client are similar but similar fix can not work, because on DataNode the responder needs properly write its state upstream and Client needs to properly read all the remaining data on the socket from first datanode.

          The main issue is that BlockReceiver thread (and DataStreamer in the case of DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty coarse control. We don't know what state the responder is in and interrupting has different effects depending on responder state. To fix this properly we need to redesign how we handle these interactions.

          I am trying out a fix for DFSClient.

          Show
          Raghu Angadi added a comment - The attached patch fixes the main problem described (practically all the time). It informs upstream properly about the the down stream failure. Similar problem exists on client side as well. So if 2nd datanode timesout, most of the time client removes the first datanode as the bad one. The issues on DataNode and Client are similar but similar fix can not work, because on DataNode the responder needs properly write its state upstream and Client needs to properly read all the remaining data on the socket from first datanode. The main issue is that BlockReceiver thread (and DataStreamer in the case of DFSClient) interrupt() the 'responder' thread. But interrupting is a pretty coarse control. We don't know what state the responder is in and interrupting has different effects depending on responder state. To fix this properly we need to redesign how we handle these interactions. I am trying out a fix for DFSClient.
          Raghu Angadi made changes -
          Attachment tmp-3339-dn.patch [ 12381704 ]
          Hide
          Raghu Angadi added a comment -

          The attached patch fixes the main problem described in the description. This handles the case when 3rd datanode (or 4th etc) fails properly. Regd failure at the 2nd datanode, it needs a fix at DFSClient and I don't have fix for it yet. I will file another jira for that.

          Show
          Raghu Angadi added a comment - The attached patch fixes the main problem described in the description. This handles the case when 3rd datanode (or 4th etc) fails properly. Regd failure at the 2nd datanode, it needs a fix at DFSClient and I don't have fix for it yet. I will file another jira for that.
          Raghu Angadi made changes -
          Attachment HADOOP-3339.patch [ 12382218 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
          Tsz Wo Nicholas Sze made changes -
          Hadoop Flags [Reviewed]
          Hide
          Raghu Angadi added a comment -

          Thanks Nicholas.

          Show
          Raghu Angadi added a comment - Thanks Nicholas.
          Raghu Angadi made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12382218/HADOOP-3339.patch
          against trunk revision 656939.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12382218/HADOOP-3339.patch against trunk revision 656939. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2496/console This message is automatically generated.
          Hide
          Raghu Angadi added a comment -

          The test failure is another case of HADOOP-3354 and is not related to this patch. Also HADOOP-3416 is filed regd DFSClient.

          Show
          Raghu Angadi added a comment - The test failure is another case of HADOOP-3354 and is not related to this patch. Also HADOOP-3416 is filed regd DFSClient.
          Raghu Angadi made changes -
          Link This issue is related to HADOOP-3416 [ HADOOP-3416 ]
          Hide
          Raghu Angadi added a comment -

          I just committed this.

          Show
          Raghu Angadi added a comment - I just committed this.
          Raghu Angadi made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Release Note Some of the failures on 3rd datanode in DFS write pipelie are not detected properly. This could lead to hard failure of client's write operation.
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #499 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/499/ )
          Robert Chansler made changes -
          Release Note Some of the failures on 3rd datanode in DFS write pipelie are not detected properly. This could lead to hard failure of client's write operation.
          Improved failure handling of last Data Node in write pipeline.
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]

            People

            • Assignee:
              Raghu Angadi
              Reporter:
              Raghu Angadi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development