Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-673

BlockReceiver#PacketResponder should not remove a packet from the ack queue before its ack is sent

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      After a BlockReceiver finish receiving the last packet of a block, it waits until the ack queue becomes empty. It them assumes that all acks have been sent and shunts down network connections. The current code removes a packet from the ack queue before its ack is sent. So there is a chance that the connection gets closed before an ack is sent.

      1. pkgRmOrder.patch
        3 kB
        Hairong Kuang

        Activity

        Hide
        sravankorumilli added a comment -

        Hi Hairong, I am using 20.1 hadoop, I did not find the problem that you are actually mentioning. What I am getting from the code is Dataxceiver thread will close the streams only after the responder thread has been completed. Can you please check and validate my comments, if I am wrong can you explain the problem.

        Show
        sravankorumilli added a comment - Hi Hairong, I am using 20.1 hadoop, I did not find the problem that you are actually mentioning. What I am getting from the code is Dataxceiver thread will close the streams only after the responder thread has been completed. Can you please check and validate my comments, if I am wrong can you explain the problem.
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #79 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/79/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #79 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/79/ )
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #47 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #47 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/ )
        Hide
        Hairong Kuang added a comment -

        I close this jira for now. If anyone wants this fix to be in 0.20, I will do so later.

        Show
        Hairong Kuang added a comment - I close this jira for now. If anyone wants this fix to be in 0.20, I will do so later.
        Hide
        Hairong Kuang added a comment -

        I committed this to trunk and branch 0.21.

        I checked branch 0.20. It seems to have the same problem. But strangely I have not seen tests failed on this in 0.20 branch. Anyboday has seen this bug in 0.20 and wants to commit the fix to 0.20?

        A consequence of this bug is that it introduces false alarms. A good datanode may fail to send an ack to the client on closedSocket exception when receiving a block.

        Show
        Hairong Kuang added a comment - I committed this to trunk and branch 0.21. I checked branch 0.20. It seems to have the same problem. But strangely I have not seen tests failed on this in 0.20 branch. Anyboday has seen this bug in 0.20 and wants to commit the fix to 0.20? A consequence of this bug is that it introduces false alarms. A good datanode may fail to send an ack to the client on closedSocket exception when receiving a block.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 on the patch.

        Show
        Tsz Wo Nicholas Sze added a comment - +1 on the patch.
        Hide
        Hairong Kuang added a comment -

        There were two failed tests: org.apache.hadoop.hdfs.server.namenode.TestBlockUnderConstruction.testBlockCreation and org.apache.hadoop.hdfs.TestFileAppend2.testComplexAppend. The first one is a known bug. For the second one, the append write failed on "Too many open files".
        This seems not related to the change in this jira. I filed HDFS-690 to track the bug.

        Show
        Hairong Kuang added a comment - There were two failed tests: org.apache.hadoop.hdfs.server.namenode.TestBlockUnderConstruction.testBlockCreation and org.apache.hadoop.hdfs.TestFileAppend2.testComplexAppend. The first one is a known bug. For the second one, the append write failed on "Too many open files". This seems not related to the change in this jira. I filed HDFS-690 to track the bug.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421358/pkgRmOrder.patch
        against trunk revision 822153.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/59/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/59/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/59/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/59/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421358/pkgRmOrder.patch against trunk revision 822153. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/59/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/59/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/59/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/59/console This message is automatically generated.
        Hide
        Hairong Kuang added a comment -

        The existing tests sometime fail with this error. So I do not think there is a need for new test. Furthuremore, I have tested this manually too.

        Show
        Hairong Kuang added a comment - The existing tests sometime fail with this error. So I do not think there is a need for new test. Furthuremore, I have tested this manually too.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Patch looks good. Do we need a test?

        Show
        Tsz Wo Nicholas Sze added a comment - Patch looks good. Do we need a test?
        Hide
        Hairong Kuang added a comment -

        A patch for review.

        Show
        Hairong Kuang added a comment - A patch for review.

          People

          • Assignee:
            Hairong Kuang
            Reporter:
            Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development