Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-690

TestAppend2#testComplexAppend failed on "Too many open files"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      the append write failed on "Too many open files":
      Some bytes were failed to append to a file on the following error:
      java.io.IOException: Cannot run program "stat": java.io.IOException: error=24, Too many open files
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
      at java.lang.Runtime.exec(Runtime.java:593)
      at java.lang.Runtime.exec(Runtime.java:466)
      at org.apache.hadoop.fs.FileUtil$HardLink.getLinkCount(FileUtil.java:644)
      at org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.unlinkBlock(ReplicaInfo.java:205)
      at org.apache.hadoop.hdfs.server.datanode.FSDataset.append(FSDataset.java:1075)
      at org.apache.hadoop.hdfs.server.datanode.FSDataset.append(FSDataset.java:1058)
      at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:110)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258)
      at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
      at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)

      1. leakingThreads.patch
        2 kB
        Hairong Kuang
      2. leakingThreads1.patch
        2 kB
        Hairong Kuang

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #120 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/120/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #120 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/120/ )
        Hide
        stack added a comment -

        @Hairong np. I see now that puts and check for empty are inside blocks that synchronize on this. Running tests to see if this patch fixes hadoop-720 now.

        Show
        stack added a comment - @Hairong np. I see now that puts and check for empty are inside blocks that synchronize on this. Running tests to see if this patch fixes hadoop-720 now.
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #54 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/)
        . TestAppend2#testComplexAppend failed on "Too many open files". Contributed by Hairong Kuang.

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #54 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/54/ ) . TestAppend2#testComplexAppend failed on "Too many open files". Contributed by Hairong Kuang.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #79 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/79/)
        . TestAppend2#testComplexAppend failed on "Too many open files". Contributed by Hairong Kuang.

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #79 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/79/ ) . TestAppend2#testComplexAppend failed on "Too many open files". Contributed by Hairong Kuang.
        Hide
        Hairong Kuang added a comment -

        I've committed this!

        Stack, I committed this because this bug sometimes broke our build so it was very annoying. Please continue to discuss the synchronization problem at HDFS-720.

        Show
        Hairong Kuang added a comment - I've committed this! Stack, I committed this because this bug sometimes broke our build so it was very annoying. Please continue to discuss the synchronization problem at HDFS-720 .
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422943/leakingThreads1.patch
        against trunk revision 828846.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/52/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/52/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/52/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/52/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422943/leakingThreads1.patch against trunk revision 828846. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/52/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/52/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/52/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/52/console This message is automatically generated.
        Hide
        Hairong Kuang added a comment -

        Puts are already synced. There is no need for checking empty ackQueue because the packet responder is the only thread that removes a packet from the queue. It already checks the queue is not empty, get the packet, send its ack, then removes it from the queue. The unsync/notification problem was introduced by HDFS-673. I was not well thought when working on the jira.

        Show
        Hairong Kuang added a comment - Puts are already synced. There is no need for checking empty ackQueue because the packet responder is the only thread that removes a packet from the queue. It already checks the queue is not empty, get the packet, send its ack, then removes it from the queue. The unsync/notification problem was introduced by HDFS-673 . I was not well thought when working on the jira.
        Hide
        stack added a comment -

        Does this patch do enough? It synchronizes the remove but not the puts to ackQueue nor the tests for empty ackQueue.

        Show
        stack added a comment - Does this patch do enough? It synchronizes the remove but not the puts to ackQueue nor the tests for empty ackQueue.
        Hide
        Konstantin Boudnik added a comment -

        +1 patch looks good!

        Show
        Konstantin Boudnik added a comment - +1 patch looks good!
        Hide
        Hairong Kuang added a comment -

        This patch creates a private method as Cos suggested.

        Show
        Hairong Kuang added a comment - This patch creates a private method as Cos suggested.
        Hide
        Konstantin Boudnik added a comment -

        How about moving them to a separate private method instead? Then it'd clear how they are suppose to be called.

        Show
        Konstantin Boudnik added a comment - How about moving them to a separate private method instead? Then it'd clear how they are suppose to be called.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 patch looks good

        ackQueue.removeFirst();
        notifyAll();
        

        Two statements above have to go together by design. I would make to the same mistake if I changed the codes. How about adding some comments to make it clear?

        Show
        Tsz Wo Nicholas Sze added a comment - +1 patch looks good ackQueue.removeFirst(); notifyAll(); Two statements above have to go together by design. I would make to the same mistake if I changed the codes. How about adding some comments to make it clear?
        Hide
        Hairong Kuang added a comment -

        Unit test is not included because it is supposed to make TestAppend2#testComplexAppend work.

        Show
        Hairong Kuang added a comment - Unit test is not included because it is supposed to make TestAppend2#testComplexAppend work.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422824/leakingThreads.patch
        against trunk revision 828116.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422824/leakingThreads.patch against trunk revision 828116. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/console This message is automatically generated.
        Hide
        Hairong Kuang added a comment -

        This patch fixes the bug.

        Show
        Hairong Kuang added a comment - This patch fixes the bug.
        Hide
        Hairong Kuang added a comment -

        This bug seemed to be caused by HDFS-673. The change made by 673 sometimes causes the main write thread not able to exit, therefore, not releasing the resources it holds, causing too many open files error.

        Show
        Hairong Kuang added a comment - This bug seemed to be caused by HDFS-673 . The change made by 673 sometimes causes the main write thread not able to exit, therefore, not releasing the resources it holds, causing too many open files error.

          People

          • Assignee:
            Hairong Kuang
            Reporter:
            Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development