Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-724

Pipeline close hangs if one of the datanode is not responsive.

    Details

    • Hadoop Flags:
      Reviewed

      Description

      In the new pipeline design, pipeline close is implemented by sending an additional empty packet. If one of the datanode does not response to this empty packet, the pipeline hangs. It seems that there is no timeout.

      1. stuckWriteAppend20.patch
        28 kB
        Hairong Kuang
      2. pipelineHeartbeat2.patch
        26 kB
        Hairong Kuang
      3. pipelineHeartbeat1.patch
        27 kB
        Hairong Kuang
      4. pipelineHeartbeat.patch
        11 kB
        Hairong Kuang
      5. HDFS-724.20-security.1.patch
        29 kB
        Jitendra Nath Pandey
      6. hbAckReply.patch
        0.8 kB
        Hairong Kuang
      7. h724_20091021.patch
        5 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Tsz Wo Nicholas Sze added a comment -

          h724_20091021.patch (not a patch to the bug): fault injection tests to simulate pipeline close with non-responsive datanode.

          Show
          Tsz Wo Nicholas Sze added a comment - h724_20091021.patch (not a patch to the bug): fault injection tests to simulate pipeline close with non-responsive datanode.
          Hide
          Hairong Kuang added a comment -

          If a datanode really becomes non-responsive, the dfs client is able to detect the problem.

          The issue here is that the test simulates a non-responsive block receiver. While the block receiver is blocked, the packet responder is still alive and sends heartbeats back periodically. So the client still thinks the pipeline is working good.

          The solution:
          1. packet responder does not send heartbeats. instead, turn on the tcp/ip level heartbeats by setting the socket "keepalive" to be true.
          2. dfs client does not receive acks until there is one outstanding packet.

          Show
          Hairong Kuang added a comment - If a datanode really becomes non-responsive, the dfs client is able to detect the problem. The issue here is that the test simulates a non-responsive block receiver. While the block receiver is blocked, the packet responder is still alive and sends heartbeats back periodically. So the client still thinks the pipeline is working good. The solution: 1. packet responder does not send heartbeats. instead, turn on the tcp/ip level heartbeats by setting the socket "keepalive" to be true. 2. dfs client does not receive acks until there is one outstanding packet.
          Hide
          dhruba borthakur added a comment -

          3. another option would be to make the dfsclient send a heartbeat-packet via the pipeline. Each BlockReceiver in the datanode(s) recognizes it as a heartbeat packet, and forward it to the next one in the pipeline. The last datanode , on receipt of this heartbeat-packet triggers it's PacketResponder to the send back the heartbeat to the previous datanode in the pipeline. The heartbeat message finally arrives back on the client. This way, we can test that the entire pipeline (both forward and backward channels) are alive and active.

          Show
          dhruba borthakur added a comment - 3. another option would be to make the dfsclient send a heartbeat-packet via the pipeline. Each BlockReceiver in the datanode(s) recognizes it as a heartbeat packet, and forward it to the next one in the pipeline. The last datanode , on receipt of this heartbeat-packet triggers it's PacketResponder to the send back the heartbeat to the previous datanode in the pipeline. The heartbeat message finally arrives back on the client. This way, we can test that the entire pipeline (both forward and backward channels) are alive and active.
          Hide
          Hairong Kuang added a comment -

          Dhruba, this is a good idea. Dfsclient sends out a heartbeat after the pipeline becomes idle for half of read timeout. A heartbeat is a special packet with seq# of -1.

          Show
          Hairong Kuang added a comment - Dhruba, this is a good idea. Dfsclient sends out a heartbeat after the pipeline becomes idle for half of read timeout. A heartbeat is a special packet with seq# of -1.
          Hide
          dhruba borthakur added a comment -

          > Dfsclient sends out a heartbeat after the pipeline becomes idle for half of read timeout. A heartbeat is a special packet with seq# of -1.

          +1.

          Show
          dhruba borthakur added a comment - > Dfsclient sends out a heartbeat after the pipeline becomes idle for half of read timeout. A heartbeat is a special packet with seq# of -1. +1.
          Hide
          Hairong Kuang added a comment -

          I am quite torn at whether a heartbeat should be
          1. a regular empty packet and be handled exactly the same as a regular data packet; or
          2. a special empty packet with a seq# of -1 and be treated differently from a regular packet. For example, it does not get added to the packet queue at both client or datanode side.

          Solution 1 is much simpler than solution 2. But is there any side effect?

          Show
          Hairong Kuang added a comment - I am quite torn at whether a heartbeat should be 1. a regular empty packet and be handled exactly the same as a regular data packet; or 2. a special empty packet with a seq# of -1 and be treated differently from a regular packet. For example, it does not get added to the packet queue at both client or datanode side. Solution 1 is much simpler than solution 2. But is there any side effect?
          Hide
          dhruba borthakur added a comment -

          Agree that solution 1 will be simpler than solution 2. Another thing that might play a part here is that, over time, we might want to add metadata to the heartbeat packet. (For example, maybe the timestamp everytime a datanode forwards it; this will enable the client to eliminate slow datanodes from the pipeline if needed.)

          Show
          dhruba borthakur added a comment - Agree that solution 1 will be simpler than solution 2. Another thing that might play a part here is that, over time, we might want to add metadata to the heartbeat packet. (For example, maybe the timestamp everytime a datanode forwards it; this will enable the client to eliminate slow datanodes from the pipeline if needed.)
          Hide
          Hairong Kuang added a comment -

          The timestamp meta information could be added to regular data packets as well. My plan is not to send heartbeats periodically. Instead a heartbeat is sent only if the connection is idle for too long time.

          Show
          Hairong Kuang added a comment - The timestamp meta information could be added to regular data packets as well. My plan is not to send heartbeats periodically. Instead a heartbeat is sent only if the connection is idle for too long time.
          Hide
          dhruba borthakur added a comment -

          > a heartbeat is sent only if the connection is idle for too long

          sounds good.

          Show
          dhruba borthakur added a comment - > a heartbeat is sent only if the connection is idle for too long sounds good.
          Hide
          Hairong Kuang added a comment -

          OK, here comes a patch that does the following:
          1. when a pipeline is idle for half of the read timeout, dfsclient sends a heartbeat packet that has a sequence number of -1 and a data length of 0;
          2. At the client side, a heartbeat packet does not get put the data queue and hence not in the ack queue;
          3. At the datanode side, the block receiver treats a heartbeat packet mostly like a regular data packet; A heatbeat packet is put in the ack queue and its ack is also the same as that of a regular data packet;
          4. The ack to a heartbeat packet may also indicate failures in the pipeline and therefore triggers pipeline recovery.

          Show
          Hairong Kuang added a comment - OK, here comes a patch that does the following: 1. when a pipeline is idle for half of the read timeout, dfsclient sends a heartbeat packet that has a sequence number of -1 and a data length of 0; 2. At the client side, a heartbeat packet does not get put the data queue and hence not in the ack queue; 3. At the datanode side, the block receiver treats a heartbeat packet mostly like a regular data packet; A heatbeat packet is put in the ack queue and its ack is also the same as that of a regular data packet; 4. The ack to a heartbeat packet may also indicate failures in the pipeline and therefore triggers pipeline recovery.
          Hide
          Hairong Kuang added a comment -

          This patch ends up implementing the heartbeat packet as a special packet. This is to avoid the complexity of both the data stream thread and the application thread creating packets simultaneously. Otherwise, the dfs client has to handle the issues like synchronization and out of sequence packets etc.

          Show
          Hairong Kuang added a comment - This patch ends up implementing the heartbeat packet as a special packet. This is to avoid the complexity of both the data stream thread and the application thread creating packets simultaneously. Otherwise, the dfs client has to handle the issues like synchronization and out of sequence packets etc.
          Hide
          dhruba borthakur added a comment -

          I am still looking at the patch, two comments:

          1. Is it possible to get rid of lastDataNodeRun() completely? This method existed because the last datanode in the pipeline needed to send heartbeats. Now, thatis not needed anymore.

          2. In DFSClient.java, one change is:

          long timeout = (stage == BlockConstructionStage.DATA_STREAMING)?
          socketTimeout/2 - (now-lastPacket) : 1000;

          timeout could sometime be negative?

          Show
          dhruba borthakur added a comment - I am still looking at the patch, two comments: 1. Is it possible to get rid of lastDataNodeRun() completely? This method existed because the last datanode in the pipeline needed to send heartbeats. Now, thatis not needed anymore. 2. In DFSClient.java, one change is: long timeout = (stage == BlockConstructionStage.DATA_STREAMING)? socketTimeout/2 - (now-lastPacket) : 1000; timeout could sometime be negative?
          Hide
          Hairong Kuang added a comment -

          Thanks Dhruba for your review comments. I guess you are still having jet lag. This patch
          1. merges with the trunk (not an easy job ,
          2. incorporates Dhruba's comments ( feels scary to remove a large chunk of code), and
          3. adds Nicholas' close related fault injection test which is attached to this jira.

          Show
          Hairong Kuang added a comment - Thanks Dhruba for your review comments. I guess you are still having jet lag. This patch 1. merges with the trunk (not an easy job , 2. incorporates Dhruba's comments ( feels scary to remove a large chunk of code), and 3. adds Nicholas' close related fault injection test which is attached to this jira.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12427545/pipelineHeartbeat1.patch
          against trunk revision 889035.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/140/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/140/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/140/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427545/pipelineHeartbeat1.patch against trunk revision 889035. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/140/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/140/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/140/console This message is automatically generated.
          Hide
          Hairong Kuang added a comment -

          This patch additionally handles acks with seqno -2 and some corner cases of the last DN in pipeline.

          Show
          Hairong Kuang added a comment - This patch additionally handles acks with seqno -2 and some corner cases of the last DN in pipeline.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12427568/pipelineHeartbeat2.patch
          against trunk revision 889090.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/83/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/83/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/83/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/83/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427568/pipelineHeartbeat2.patch against trunk revision 889090. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/83/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/83/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/83/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/83/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          This is s tough one to review. phew!

          Show
          dhruba borthakur added a comment - This is s tough one to review. phew!
          Hide
          Hairong Kuang added a comment -

          Let me provide more information about this patch:
          1. heart beat is an empty packet with a sequence number of -1;
          2. At the client side, heart beat packets are not queued in any of the queues because there is no need to resend a heartbeat if there is an error sending heartbeats.
          2. At the datanode side, heartbeats are queued in the ack queue. A datanode treats a heartbeat the same as a regular data packet. The ack of a heartbeat packet is the same as a regular data packet as well. To distinguish a heartbeat from an end-of-block packet, receivePacket returns -1 when receiving an end-of-block packet.

          Dhruba, please feel free to reach me if you need more explanation. I need this patch to be committed before I committing HDFS-101.

          Show
          Hairong Kuang added a comment - Let me provide more information about this patch: 1. heart beat is an empty packet with a sequence number of -1; 2. At the client side, heart beat packets are not queued in any of the queues because there is no need to resend a heartbeat if there is an error sending heartbeats. 2. At the datanode side, heartbeats are queued in the ack queue. A datanode treats a heartbeat the same as a regular data packet. The ack of a heartbeat packet is the same as a regular data packet as well. To distinguish a heartbeat from an end-of-block packet, receivePacket returns -1 when receiving an end-of-block packet. Dhruba, please feel free to reach me if you need more explanation. I need this patch to be committed before I committing HDFS-101 .
          Hide
          dhruba borthakur added a comment -

          +1. I was able to look at the entire code and the unit test as well. Thanks for the explanation.

          Show
          dhruba borthakur added a comment - +1. I was able to look at the entire code and the unit test as well. Thanks for the explanation.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #148 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/148/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #148 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/148/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #172 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/172/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #172 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/172/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #148 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/148/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #148 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/148/ )
          Hide
          Hairong Kuang added a comment -

          I've committed this.

          Show
          Hairong Kuang added a comment - I've committed this.
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #94 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/94/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #94 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/94/ )
          Hide
          Hairong Kuang added a comment -

          I plan to pull in this fix to append20 branch in order to fix HBASE-2861.

          Show
          Hairong Kuang added a comment - I plan to pull in this fix to append20 branch in order to fix HBASE-2861 .
          Hide
          dhruba borthakur added a comment -

          +1 for the append-0.20 patch, please check it into the 0.20-append branch.

          Show
          dhruba borthakur added a comment - +1 for the append-0.20 patch, please check it into the 0.20-append branch.
          Hide
          Hairong Kuang added a comment -

          HDFS-724 append0.20 patch does not deserialize the ack to a heartbeat correctly. Here is the patch that fixes it.

          Show
          Hairong Kuang added a comment - HDFS-724 append0.20 patch does not deserialize the ack to a heartbeat correctly. Here is the patch that fixes it.
          Hide
          stack added a comment -

          I tried it by running a few of the hbase tests that were failing before hbAckReply.patch. They pass now (with hdfs-895 in place on 0.20-append). Let me run the full suite. I'll report back.

          One issue that came up over in hbase-3234 was the fact that hdfs-724 does this:

          -  public static final int DATA_TRANSFER_VERSION = 14;
          +  public static final int DATA_TRANSFER_VERSION = 15;
          

          Could we not do the above on commit to 0.20-append? No biggie. Just asking. Would be nice not making comm incompatible. Good stuff Hairong.

          Show
          stack added a comment - I tried it by running a few of the hbase tests that were failing before hbAckReply.patch. They pass now (with hdfs-895 in place on 0.20-append). Let me run the full suite. I'll report back. One issue that came up over in hbase-3234 was the fact that hdfs-724 does this: - public static final int DATA_TRANSFER_VERSION = 14; + public static final int DATA_TRANSFER_VERSION = 15; Could we not do the above on commit to 0.20-append? No biggie. Just asking. Would be nice not making comm incompatible. Good stuff Hairong.
          Hide
          stack added a comment -

          +1 on the hbAckReply patch. I ran the hbase tests and all that used fail now passes (with hdfs-895 in place).

          Show
          stack added a comment - +1 on the hbAckReply patch. I ran the hbase tests and all that used fail now passes (with hdfs-895 in place).
          Hide
          Hairong Kuang added a comment -

          Thanks Stack for testing this. I just committed hbAckReply patch to append 0.20 branch.

          For the incompatible problem, as I commented on the hbase jira, HDFS-724 unfortunately changed the syntax and semantics of heartbeat packets.

          Show
          Hairong Kuang added a comment - Thanks Stack for testing this. I just committed hbAckReply patch to append 0.20 branch. For the incompatible problem, as I commented on the hbase jira, HDFS-724 unfortunately changed the syntax and semantics of heartbeat packets.
          Hide
          Jitendra Nath Pandey added a comment -

          Patch for 20-security uploaded.

          Show
          Jitendra Nath Pandey added a comment - Patch for 20-security uploaded.
          Hide
          Suresh Srinivas added a comment -

          This patch does not compile for me.

          Show
          Suresh Srinivas added a comment - This patch does not compile for me.
          Hide
          Suresh Srinivas added a comment -

          My bad, I had not applied HDFS-1057 patch which is required for this patch. +1 for the patch.

          Show
          Suresh Srinivas added a comment - My bad, I had not applied HDFS-1057 patch which is required for this patch. +1 for the patch.
          Hide
          Suresh Srinivas added a comment -

          I committed the patch to 0.20-security branch.

          Show
          Suresh Srinivas added a comment - I committed the patch to 0.20-security branch.

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development