Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-863

Potential deadlock in TestOverReplicatedBlocks

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      TestOverReplicatedBlocks.testProcesOverReplicateBlock synchronizes on namesystem.heartbeats without synchronizing on namesystem first. Other places in the code synchronize namesystem, then heartbeats. It's probably unlikely to occur in this test case, but it's a simple fix.

      1. cycle.png
        37 kB
        Todd Lipcon
      2. TestNodeCount.png
        34 kB
        Todd Lipcon
      3. HDFS-863.patch
        3 kB
        Ken Goodhope
      4. HDFS-863.patch
        9 kB
        Ken Goodhope
      5. HDFS-863.patch
        9 kB
        Ken Goodhope
      6. HDFS-863.patch
        10 kB
        Ken Goodhope

        Activity

        Hide
        Todd Lipcon added a comment -

        Same thing occurs in TestNodeCount

        Show
        Todd Lipcon added a comment - Same thing occurs in TestNodeCount
        Hide
        Konstantin Shvachko added a comment -

        Is it actually breaking builds at this point? Does findbugs raises warnings about it? If so we should mark it a blocker.

        Show
        Konstantin Shvachko added a comment - Is it actually breaking builds at this point? Does findbugs raises warnings about it? If so we should mark it a blocker.
        Hide
        Todd Lipcon added a comment -

        Nope, I've never seen it actually deadlock a test. JCarder just reports it as a potential deadlock spot if the threads interleaved differently.

        Show
        Todd Lipcon added a comment - Nope, I've never seen it actually deadlock a test. JCarder just reports it as a potential deadlock spot if the threads interleaved differently.
        Hide
        Konstantin Shvachko added a comment -

        Moved to 0.23

        Show
        Konstantin Shvachko added a comment - Moved to 0.23
        Hide
        Ken Goodhope added a comment -

        Added FSNameSystem.write

        {Lock|Unlock}

        arount the synchronized heartbeats. Added the same fix for TestHeartbeatHandling and TestNodeCount.

        Show
        Ken Goodhope added a comment - Added FSNameSystem.write {Lock|Unlock} arount the synchronized heartbeats. Added the same fix for TestHeartbeatHandling and TestNodeCount.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12468210/HDFS-863.patch
        against trunk revision 1057414.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/103//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/103//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/103//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468210/HDFS-863.patch against trunk revision 1057414. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/103//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/103//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/103//console This message is automatically generated.
        Hide
        Ken Goodhope added a comment -

        The results of running ant test.
        Skipped TestLargeDirectoryDelete since it stalled during that test.
        [junit] Test org.apache.hadoop.hdfs.TestHDFSServerPorts FAILED
        [junit] Test org.apache.hadoop.hdfs.server.namenode.TestStorageRestore FAILED

        BUILD FAILED
        /home/kgoodhope/workspace/hadoop-hdfs-trunk/build.xml:735: Tests failed!

        Total time: 100 minutes 3 seconds

        Show
        Ken Goodhope added a comment - The results of running ant test. Skipped TestLargeDirectoryDelete since it stalled during that test. [junit] Test org.apache.hadoop.hdfs.TestHDFSServerPorts FAILED [junit] Test org.apache.hadoop.hdfs.server.namenode.TestStorageRestore FAILED BUILD FAILED /home/kgoodhope/workspace/hadoop-hdfs-trunk/build.xml:735: Tests failed! Total time: 100 minutes 3 seconds
        Hide
        Todd Lipcon added a comment -

        Seems the new writeUnlock() calls should be inside a finally{} clause, no? Otherwise I fear we might get a test timeout if one of these functions throws an exception (since the minicluster wouldn't be able to shutdown if the writelock is left held)

        Show
        Todd Lipcon added a comment - Seems the new writeUnlock() calls should be inside a finally{} clause, no? Otherwise I fear we might get a test timeout if one of these functions throws an exception (since the minicluster wouldn't be able to shutdown if the writelock is left held)
        Hide
        Ken Goodhope added a comment -

        Agreed, and done. Reran tests with the following results

        [junit] Test org.apache.hadoop.hdfs.server.namenode.TestStorageRestore FAILED
        [junit] Test org.apache.hadoop.hdfs.TestFileConcurrentReader FAILED
        [junit] Test org.apache.hadoop.hdfs.server.namenode.TestNNThroughputBenchmark FAILED

        Skipped src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestLargeDirectoryDelete.java since last time that test stalled.

        Show
        Ken Goodhope added a comment - Agreed, and done. Reran tests with the following results [junit] Test org.apache.hadoop.hdfs.server.namenode.TestStorageRestore FAILED [junit] Test org.apache.hadoop.hdfs.TestFileConcurrentReader FAILED [junit] Test org.apache.hadoop.hdfs.server.namenode.TestNNThroughputBenchmark FAILED Skipped src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestLargeDirectoryDelete.java since last time that test stalled.
        Hide
        Todd Lipcon added a comment -

        Hi Ken. Looks good except for one nit - there are some hard tab characters (eg TestNodeCount.java:117). The style guide is 2-space indentation. Mind reformatting those?

        Show
        Todd Lipcon added a comment - Hi Ken. Looks good except for one nit - there are some hard tab characters (eg TestNodeCount.java:117). The style guide is 2-space indentation. Mind reformatting those?
        Hide
        Ken Goodhope added a comment -

        Thought I caught all of those, but obviously not. Found a few more and removed them as well. Thanks.

        Show
        Ken Goodhope added a comment - Thought I caught all of those, but obviously not. Found a few more and removed them as well. Thanks.
        Hide
        Ken Goodhope added a comment -

        Finally got my auto format set up right and it found a couple more issues my eyes missed. Only formatted the sections I worked on.

        Show
        Ken Goodhope added a comment - Finally got my auto format set up right and it found a couple more issues my eyes missed. Only formatted the sections I worked on.
        Hide
        Jakob Homan added a comment -

        +1

        Show
        Jakob Homan added a comment - +1
        Hide
        Jakob Homan added a comment -

        Trunk isn't compiling due to a Common dependency, but by manually installing a new Common jar into the cache, I verified that this compiles. I've committed this. Resolving as fixed. Thanks, Ken!

        Show
        Jakob Homan added a comment - Trunk isn't compiling due to a Common dependency, but by manually installing a new Common jar into the cache, I verified that this compiles. I've committed this. Resolving as fixed. Thanks, Ken!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #539 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #539 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )

          People

          • Assignee:
            Ken Goodhope
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development