Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1346

DFSClient receives out of order packet ack

    Details

    • Hadoop Flags:
      Reviewed

      Description

      When running 0.20 patched with HDFS-101, we sometimes see an error as follow:
      WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: Expecting seq
      no for block blk_-2871223654872350746_21421120 10280 but received 10281
      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570)

      This indicates that DFS client expects an ack for packet N, but receives an ack for packet N+1.

      1. HDFS-1346.20-security.1.patch
        2 kB
        Jitendra Nath Pandey
      2. blockrecv-diff.txt
        4 kB
        Todd Lipcon
      3. outOfOrder.patch
        2 kB
        Hairong Kuang

        Issue Links

          Activity

          Hide
          Hairong Kuang added a comment -

          This seems to be caused by a race condition in BlockReceiver#PacketResponder.run(). It has the following piece of code where mirrorError is shared by two threads:

          1:  if (!mirrorError) {
          2:    // read an ack from downstream datanode
          3:    ack.readFields(mirrorIn);
          4:    ...
          5:    seqno = ack.getSeqno();
          6:  }
          7:  if (seqno >= 0 || mirrorError) {
          8:    Packet pkt = null;
          9:    synchronized (this) {
          10:    ...
          11:    pkt = ackQueue.removeFirst();
          12:    expected = pkt.seqno;
          13:    ...;
          14:   }
          15: }
          

          If starting at line 1, mirrorError is false, the thread reads an ack from a downsream datanode (line 3). If it happens that the ack is for a heartbeat packet, seqno is -1 (line 5). Then if it happens that the other thread changes mirrorError to be true in between lines 4 and 5, the condition becomes true on line 7. A data packet is removed from ackQueue on line 12, which should not because the ack is for a heartbeat packet not for a data packet. So an ack for a data packet ends up being dropped.

          Show
          Hairong Kuang added a comment - This seems to be caused by a race condition in BlockReceiver#PacketResponder.run(). It has the following piece of code where mirrorError is shared by two threads: 1: if (!mirrorError) { 2: // read an ack from downstream datanode 3: ack.readFields(mirrorIn); 4: ... 5: seqno = ack.getSeqno(); 6: } 7: if (seqno >= 0 || mirrorError) { 8: Packet pkt = null ; 9: synchronized ( this ) { 10: ... 11: pkt = ackQueue.removeFirst(); 12: expected = pkt.seqno; 13: ...; 14: } 15: } If starting at line 1, mirrorError is false, the thread reads an ack from a downsream datanode (line 3). If it happens that the ack is for a heartbeat packet, seqno is -1 (line 5). Then if it happens that the other thread changes mirrorError to be true in between lines 4 and 5, the condition becomes true on line 7. A data packet is removed from ackQueue on line 12, which should not because the ack is for a heartbeat packet not for a data packet. So an ack for a data packet ends up being dropped.
          Hide
          Hairong Kuang added a comment -

          This patch introduces a local variable localMirrorError to avoid the race condition.

          Show
          Hairong Kuang added a comment - This patch introduces a local variable localMirrorError to avoid the race condition.
          Hide
          Todd Lipcon added a comment -

          Any way to trigger this with a unit test that would inject a Thread.sleep in there using mockito? Or just too hard? (I'm surprised I haven't run into this ever in append testing)

          Show
          Todd Lipcon added a comment - Any way to trigger this with a unit test that would inject a Thread.sleep in there using mockito? Or just too hard? (I'm surprised I haven't run into this ever in append testing)
          Hide
          sam rash added a comment -

          not easily--PacketResponder is a non-static inner class. Constructing a BlockReceiver requires a Datanode instance.
          If you can harness a Datanode, then you need to stub out the DataInputStream and figure out when to fire a callback (somehow when ack.readFields() reads from the DatainputStream, but not before).

          I think it's possible, but we haven't had time yet

          Show
          sam rash added a comment - not easily--PacketResponder is a non-static inner class. Constructing a BlockReceiver requires a Datanode instance. If you can harness a Datanode, then you need to stub out the DataInputStream and figure out when to fire a callback (somehow when ack.readFields() reads from the DatainputStream, but not before). I think it's possible, but we haven't had time yet
          Hide
          Todd Lipcon added a comment -

          I'm noticing that the HDFS-101 implementation we've been running with is a bit different than what made it into the branch-0.20-append, and I don't think susceptible to the issue. Attaching a diff of our BlockReceiver.java against the branch-20-append one - ignore the bit at the end with checkDiskError, etc - that's from another patch.

          I think the relevent diff chunk here is the first - we detect a HEART_BEAT seqno immediately and "continue" around the loop in that case. Is this a cleaner fix than the local variable?

          Show
          Todd Lipcon added a comment - I'm noticing that the HDFS-101 implementation we've been running with is a bit different than what made it into the branch-0.20-append, and I don't think susceptible to the issue. Attaching a diff of our BlockReceiver.java against the branch-20-append one - ignore the bit at the end with checkDiskError, etc - that's from another patch. I think the relevent diff chunk here is the first - we detect a HEART_BEAT seqno immediately and "continue" around the loop in that case. Is this a cleaner fix than the local variable?
          Hide
          Hairong Kuang added a comment -

          Todd, yours is missing this patch: https://issues.apache.org/jira/secure/attachment/12439379/pipelineHeartbeat.patch. HDFS-101 says that it fixes a bug of incorrect handle of pipeline heartbeat in yahoo's hadoop security branch 0.20. But I did not put the bug description there.

          Koji, do you still remember what exact problem that pipelineHeartbeat.patch is fixed?

          Show
          Hairong Kuang added a comment - Todd, yours is missing this patch: https://issues.apache.org/jira/secure/attachment/12439379/pipelineHeartbeat.patch . HDFS-101 says that it fixes a bug of incorrect handle of pipeline heartbeat in yahoo's hadoop security branch 0.20. But I did not put the bug description there. Koji, do you still remember what exact problem that pipelineHeartbeat.patch is fixed?
          Hide
          Hairong Kuang added a comment -

          This patch has been deployed on our production cluster for a while and it seems to have fixed the bug. The fix helps reducing the chance of losing data when hflushing.

          Shall we commit it to the 0.20 append branch?

          Show
          Hairong Kuang added a comment - This patch has been deployed on our production cluster for a while and it seems to have fixed the bug. The fix helps reducing the chance of losing data when hflushing. Shall we commit it to the 0.20 append branch?
          Hide
          dhruba borthakur added a comment -

          Hi Hairong, I agree with your observation. Please commit it to 0.20 append branch too. Thanks.

          Show
          dhruba borthakur added a comment - Hi Hairong, I agree with your observation. Please commit it to 0.20 append branch too. Thanks.
          Hide
          Hairong Kuang added a comment -

          I've committed this.

          Show
          Hairong Kuang added a comment - I've committed this.
          Hide
          Jitendra Nath Pandey added a comment -

          Patch for 20-security, ported from 20-append.

          Show
          Jitendra Nath Pandey added a comment - Patch for 20-security, ported from 20-append.
          Hide
          Suresh Srinivas added a comment -

          +1 for the patch. I committed it to 0.20-security branch.

          Show
          Suresh Srinivas added a comment - +1 for the patch. I committed it to 0.20-security branch.
          Hide
          Matt Foley added a comment -

          Closed upon release of 0.20.205.0

          Show
          Matt Foley added a comment - Closed upon release of 0.20.205.0

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development