Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-637

DataNode sends a Success ack when block write fails

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When I work on HDFS-624, I saw TestFileAppend3#TC7 occasionally fails. After lots of debug, I saw that the client unexpected received a response of "-2 SUCCESS SUCCESS" in which -2 is the packet sequence number. This happened in a pipeline of 2 datanodes and one of them failed. It turned out when block receiver fails, it shuts down itself and interrupts the packet responder but responder tries to handle interruption with the condition "Thread.isInterrupted()" but unfortunately a thread's interrupt status is not set in some cases as explained in the Thread#interrupt javadoc:

      If this thread is blocked in an invocation of the wait(), wait(long), or wait(long, int) methods of the Object class, or of the join(), join(long), join(long, int), sleep(long), or sleep(long, int), methods of this class, then its interrupt status will be cleared and it will receive an InterruptedException.

      So datanode does not detect the interruption and continues as if no error occurs.

      1. interrupted1.patch
        2 kB
        Hairong Kuang
      2. interrupted.patch
        1 kB
        Hairong Kuang

        Activity

        Hide
        Hairong Kuang added a comment -

        This patch should fix the bug.

        Show
        Hairong Kuang added a comment - This patch should fix the bug.
        Hide
        Kan Zhang added a comment -

        I like Raghu's suggestion, which is to simply set a boolean flag in the catch clause (the clause has to be moved to the outset) and add a checking of the flag to the if (Thread.interrupted()) {} block. That way the exit logic is easier to understand (with the comments there). Having the exit logic in one place also facilitates adding further checking on the reason of the interrupt (i.e., local error or downstream error).

        Show
        Kan Zhang added a comment - I like Raghu's suggestion, which is to simply set a boolean flag in the catch clause (the clause has to be moved to the outset) and add a checking of the flag to the if (Thread.interrupted()) {} block. That way the exit logic is easier to understand (with the comments there). Having the exit logic in one place also facilitates adding further checking on the reason of the interrupt (i.e., local error or downstream error).
        Hide
        Hairong Kuang added a comment -

        Ok, whatever is easier for your future change, I will do it for you.

        Show
        Hairong Kuang added a comment - Ok, whatever is easier for your future change, I will do it for you.
        Hide
        Kan Zhang added a comment -

        +1 for the patch.

        Show
        Kan Zhang added a comment - +1 for the patch.
        Hide
        Hairong Kuang added a comment -

        All unit tests passed. Test patch result:
        [exec] +1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
        [exec] Please justify why no new tests are needed for this patch.
        [exec] Also please list what manual steps were performed to verify this patch.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

        Unit test is not included because HDFS-624 exposed the bug and with this patch the previously failed test is passed.

        Show
        Hairong Kuang added a comment - All unit tests passed. Test patch result: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit test is not included because HDFS-624 exposed the bug and with this patch the previously failed test is passed.
        Hide
        Hairong Kuang added a comment -

        I've just committed this.

        Show
        Hairong Kuang added a comment - I've just committed this.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #57 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/57/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #57 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/57/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #98 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/98/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #98 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/98/ )
        Hide
        gary murry added a comment -

        No new unit tests were added because it was found with TestFileAppend3#TC7, which can act as a regression test.

        Show
        gary murry added a comment - No new unit tests were added because it was found with TestFileAppend3#TC7, which can act as a regression test.
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #20 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/20/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #20 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/20/ )

          People

          • Assignee:
            Hairong Kuang
            Reporter:
            Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development