Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: tasktracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      JvmManager.JvmManagerForType has several HashMap members that are inconsistently synchronized. I've seen sporadic NPEs in the 0.20 version of this code which has similar bugs.

      1. mapreduce-2224.txt
        14 kB
        Todd Lipcon
      2. mapreduce-2224.txt
        14 kB
        Todd Lipcon
      3. mapreduce-2224.txt
        11 kB
        Todd Lipcon
      4. mapreduce-2224.txt
        3 kB
        Todd Lipcon
      5. mapreduce-2224-cdh3.txt
        10 kB
        Todd Lipcon
      6. mr-2224-0.22.patch
        7 kB
        Benoy Antony
      7. mr-2224-0.22.patch
        7 kB
        Benoy Antony

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Attaching backport I did for CDH 20 branch which should also apply on YDH. Includes unit test. Will port unit test forward also

          Show
          Todd Lipcon added a comment - Attaching backport I did for CDH 20 branch which should also apply on YDH. Includes unit test. Will port unit test forward also
          Hide
          Todd Lipcon added a comment -

          Updated patch for trunk to include unit test. Also I had missed one synchronization case which the test found.

          Show
          Todd Lipcon added a comment - Updated patch for trunk to include unit test. Also I had missed one synchronization case which the test found.
          Hide
          Scott Chen added a comment -

          Good catch!
          I have a question. Is the following part necessary? The method kill() is already synchronized.

          @@ -493,7 +501,9 @@ class JvmManager {
                     // Check inital context before issuing a kill to prevent situations
                     // where kill is issued before task is launched.
                     if (initalContext != null && initalContext.env != null) {
          -            initalContext.pid = jvmIdToPid.get(jvmId);
          +            synchronized (JvmManagerForType.this) {
          +              initalContext.pid = jvmIdToPid.get(jvmId);
          +            }
          
          Show
          Scott Chen added a comment - Good catch! I have a question. Is the following part necessary? The method kill() is already synchronized. @@ -493,7 +501,9 @@ class JvmManager { // Check inital context before issuing a kill to prevent situations // where kill is issued before task is launched. if (initalContext != null && initalContext.env != null ) { - initalContext.pid = jvmIdToPid.get(jvmId); + synchronized (JvmManagerForType. this ) { + initalContext.pid = jvmIdToPid.get(jvmId); + }
          Hide
          Todd Lipcon added a comment -

          Hi Scott. Yea, it's necessary, since kill() is synchronized on the inner class. It needs to synchronize on its outer "this" - note that it's JvmManagerForType.this and not this

          Show
          Todd Lipcon added a comment - Hi Scott. Yea, it's necessary, since kill() is synchronized on the inner class. It needs to synchronize on its outer "this" - note that it's JvmManagerForType.this and not this
          Hide
          Scott Chen added a comment -

          Hey Todd, Thanks. Got it. I missed that there is this inner class of a inner class of a class.

          I have another question.
          It seems that now following two code paths will have two different locking sequences for JvmRunner and JvmManagerForType.
          Is it possible that this may deadlock?

          JvmManagerForType.killJvmRunner() -> JvmRunner.kill()
          JvmRunner.runChild() -> JvmRunner.kill()
          
          Show
          Scott Chen added a comment - Hey Todd, Thanks. Got it. I missed that there is this inner class of a inner class of a class. I have another question. It seems that now following two code paths will have two different locking sequences for JvmRunner and JvmManagerForType. Is it possible that this may deadlock? JvmManagerForType.killJvmRunner() -> JvmRunner.kill() JvmRunner.runChild() -> JvmRunner.kill()
          Hide
          Todd Lipcon added a comment -

          Good call, let me look into that.

          Show
          Todd Lipcon added a comment - Good call, let me look into that.
          Hide
          Todd Lipcon added a comment -

          Hi Scott. That was a good catch - jcarder confirmed the potential deadlock you saw. Here's a patch which is a bit bigger but should be deadlock free and cleans up some encapsulation issues as well in the process.

          Show
          Todd Lipcon added a comment - Hi Scott. That was a good catch - jcarder confirmed the potential deadlock you saw. Here's a patch which is a bit bigger but should be deadlock free and cleans up some encapsulation issues as well in the process.
          Hide
          Todd Lipcon added a comment -
          Show
          Todd Lipcon added a comment - review here: https://reviews.apache.org/r/190/diff/#index_header
          Hide
          Scott Chen added a comment -

          Hi Todd, I like the patch. Just have some minor comments in the review board.

          Show
          Scott Chen added a comment - Hi Todd, I like the patch. Just have some minor comments in the review board.
          Hide
          Todd Lipcon added a comment -

          new revision from review board

          Show
          Todd Lipcon added a comment - new revision from review board
          Hide
          Scott Chen added a comment -

          +1 Looks good.
          Todd: Can you run ant test and post it here? I think Hudson doesn't work.

          Show
          Scott Chen added a comment - +1 Looks good. Todd: Can you run ant test and post it here? I think Hudson doesn't work.
          Hide
          Todd Lipcon added a comment -

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          [exec]
          [exec] +1 system test framework. The patch passed system test framework compile.

          Failed tests:
          [junit] Test org.apache.hadoop.mapred.TestControlledMapReduceJob FAILED (timeout)
          [junit] Test org.apache.hadoop.mapreduce.lib.db.TestDBJob FAILED (timeout)
          [junit] Test org.apache.hadoop.mapreduce.lib.db.TestDataDrivenDBInputFormat FAILED (timeout)

          which I believe are all failing on trunk

          Show
          Todd Lipcon added a comment - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. Failed tests: [junit] Test org.apache.hadoop.mapred.TestControlledMapReduceJob FAILED (timeout) [junit] Test org.apache.hadoop.mapreduce.lib.db.TestDBJob FAILED (timeout) [junit] Test org.apache.hadoop.mapreduce.lib.db.TestDataDrivenDBInputFormat FAILED (timeout) which I believe are all failing on trunk
          Hide
          Todd Lipcon added a comment -

          Committed to trunk and branch-22. Thanks for reviewing, Scott.

          Show
          Todd Lipcon added a comment - Committed to trunk and branch-22. Thanks for reviewing, Scott.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #564 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/564/)
          MAPREDUCE-2224. Fix synchronization bugs in JvmManager. Contributed by Todd Lipcon

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #564 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/564/ ) MAPREDUCE-2224 . Fix synchronization bugs in JvmManager. Contributed by Todd Lipcon
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-22-branch #33 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-22-branch/33/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-22-branch #33 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-22-branch/33/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #643 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk/643/ )
          Hide
          Benoy Antony added a comment -

          Patch for 0.22

          Show
          Benoy Antony added a comment - Patch for 0.22
          Hide
          Benoy Antony added a comment -

          Updated the patch to reflect the changes to JvmManager due to MR-2178

          Show
          Benoy Antony added a comment - Updated the patch to reflect the changes to JvmManager due to MR-2178
          Hide
          Konstantin Shvachko added a comment -

          This is already committed to 0.22 and should go into a separate jira.
          Created MAPREDUCE-4314 to commit the patch.

          Show
          Konstantin Shvachko added a comment - This is already committed to 0.22 and should go into a separate jira. Created MAPREDUCE-4314 to commit the patch.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development