Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-723

Deadlock in DFSClient#DFSOutputStream

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      WhiIe was running some append-related tests, I hit this deadlock:

      Found one Java-level deadlock:
      =============================
      "Thread-3":
        waiting to lock monitor 0x000000012ee044f0 (object 0x0000000107a0ded0, a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream),
        which is held by "main"
      "main":
        waiting to lock monitor 0x000000012eeb71a8 (object 0x00000001082b0748, a org.apache.hadoop.hdfs.DFSClient$LeaseChecker),
        which is held by "Thread-3"
      
      Java stack information for the threads listed above:
      ===================================================
      "Thread-3":
      	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3582)
      	- waiting to lock <0x0000000107a0ded0> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
      	at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1175)
      	- locked <0x00000001082b0748> (a org.apache.hadoop.hdfs.DFSClient$LeaseChecker)
      	at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:306)
      	- locked <0x000000010824d640> (a org.apache.hadoop.hdfs.DFSClient)
              - waiting to lock <0x0000000107a0ded0> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
              at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1175)
              - locked <0x00000001082b0748> (a org.apache.hadoop.hdfs.DFSClient$LeaseChecker)
              at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:306)
              - locked <0x000000010824d640> (a org.apache.hadoop.hdfs.DFSClient)
              at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:325)
              at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1835)
              - locked <0x0000000107a77ec8> (a org.apache.hadoop.fs.FileSystem$Cache)
              at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:1851)
              - locked <0x00000001079daa00> (a org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer)
      "main":
              at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.remove(DFSClient.java:1151)
              - waiting to lock <0x00000001082b0748> (a org.apache.hadoop.hdfs.DFSClient$LeaseChecker)
              at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3609)
              - locked <0x0000000107a0ded0> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
              at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
              at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
              at org.apache.hadoop.hdfs.TestFileAppend4.testAppend(TestFileAppend4.java:99)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at junit.framework.TestCase.runTest(TestCase.java:168)
              at junit.framework.TestCase.runBare(TestCase.java:134)
              at junit.framework.TestResult$1.protect(TestResult.java:110)
              at junit.framework.TestResult.runProtected(TestResult.java:128)
              at junit.framework.TestResult.run(TestResult.java:113)
              at junit.framework.TestCase.run(TestCase.java:124)
              at junit.framework.TestSuite.runTest(TestSuite.java:232)
              at junit.framework.TestSuite.run(TestSuite.java:227)
              at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
              at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
              at junit.framework.TestResult.runProtected(TestResult.java:128)
              at junit.extensions.TestSetup.run(TestSetup.java:27)
              at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
              at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
              at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
      
      1. deadlock.patch
        0.8 kB
        Hairong Kuang
      2. deadlock_0.21.patch
        0.8 kB
        Hairong Kuang

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #122 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/122/)
        Move the change log of from section 0.21 to section 0.20.2
        . Fix deadlock in DFSClient#DFSOutputStream. Contributed by Hairong Kuang.

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #122 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/122/ ) Move the change log of from section 0.21 to section 0.20.2 . Fix deadlock in DFSClient#DFSOutputStream. Contributed by Hairong Kuang.
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/ )
        Hide
        Hairong Kuang added a comment -

        I've committed this!

        Show
        Hairong Kuang added a comment - I've committed this!
        Hide
        Hairong Kuang added a comment -

        I identified this bug when I ran the test in HDFS-728. With this patch, the test can run through.

        In branch 0.20, TestDatanodeBlockScanner times out with or without my fix. I filed HDFS-734 to track this.

        Show
        Hairong Kuang added a comment - I identified this bug when I ran the test in HDFS-728 . With this patch, the test can run through. In branch 0.20, TestDatanodeBlockScanner times out with or without my fix. I filed HDFS-734 to track this.
        Hide
        Hairong Kuang added a comment -

        Here is the patch for 0.21.

        Show
        Hairong Kuang added a comment - Here is the patch for 0.21.
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #61 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/61/)
        Move the change log of from section 0.21 to section 0.20.2
        . Fix deadlock in DFSClient#DFSOutputStream. Contributed by Hairong Kuang.

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #61 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/61/ ) Move the change log of from section 0.21 to section 0.20.2 . Fix deadlock in DFSClient#DFSOutputStream. Contributed by Hairong Kuang.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #84 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/84/)
        Move the change log of from section 0.21 to section 0.20.2

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #84 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/84/ ) Move the change log of from section 0.21 to section 0.20.2
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #83 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/83/)
        . Fix deadlock in DFSClient#DFSOutputStream. Contributed by Hairong Kuang.

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #83 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/83/ ) . Fix deadlock in DFSClient#DFSOutputStream. Contributed by Hairong Kuang.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422869/deadlock.patch
        against trunk revision 828900.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/53/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/53/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/53/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/53/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422869/deadlock.patch against trunk revision 828900. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/53/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/53/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/53/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/53/console This message is automatically generated.
        Hide
        Suresh Srinivas added a comment -

        +1 for the patch. Since the existing test caught this, this need not have a separate test right?

        Show
        Suresh Srinivas added a comment - +1 for the patch. Since the existing test caught this, this need not have a separate test right?
        Hide
        Hairong Kuang added a comment -

        Here is patch that changes LeaseChecker.close() so that it does not require a lock on LeaseChecker to close a file output stream.

        Show
        Hairong Kuang added a comment - Here is patch that changes LeaseChecker.close() so that it does not require a lock on LeaseChecker to close a file output stream.

          People

          • Assignee:
            Hairong Kuang
            Reporter:
            Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development