Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2364

Shouldn't hold lock on rjob while localizing resources.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.203.0
    • Fix Version/s: 0.20.204.0
    • Component/s: tasktracker
    • Labels:
      None

      Description

      There is a deadlock while localizing resources on the TaskTracker.

      1. MAPREDUCE-2364.patch
        0.9 kB
        Binglin Chang
      2. no-lock-localize-branch-0.20-security.patch
        6 kB
        Devaraj Das
      3. no-lock-localize-trunk.patch
        5 kB
        Binglin Chang

        Issue Links

          Activity

          Hide
          Binglin Chang added a comment -

          We encounter the same problem, when TaskTracker download & unJar a very big job.jar in localizeJob(), it stops sending heartbeat and web service hangs too.
          Our solution for this issue is to add a new lock in RunningJob class called localizing. Instead of holding the whole rjob lock, rjob.localizing is locked.

          Show
          Binglin Chang added a comment - We encounter the same problem, when TaskTracker download & unJar a very big job.jar in localizeJob(), it stops sending heartbeat and web service hangs too. Our solution for this issue is to add a new lock in RunningJob class called localizing. Instead of holding the whole rjob lock, rjob.localizing is locked.
          Hide
          Binglin Chang added a comment -

          trunk patch

          Show
          Binglin Chang added a comment - trunk patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12480768/MAPREDUCE-2364.patch
          against trunk revision 1129771.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.cli.TestMRCLI

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/328//testReport/
          Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/328//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/328//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12480768/MAPREDUCE-2364.patch against trunk revision 1129771. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/328//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/328//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/328//console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          Hi Binglin, I thought I'd attach the patch that I did for branch-0.20-security. The crux of the patch you submitted and the one i did is mostly the same..
          Please have a look at this one, and see if you can map it to a trunk patch. Thanks!

          Show
          Devaraj Das added a comment - Hi Binglin, I thought I'd attach the patch that I did for branch-0.20-security. The crux of the patch you submitted and the one i did is mostly the same.. Please have a look at this one, and see if you can map it to a trunk patch. Thanks!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12481414/no-lock-localize-branch-0.20-security.patch
          against trunk revision 1131265.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/345//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481414/no-lock-localize-branch-0.20-security.patch against trunk revision 1131265. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/345//console This message is automatically generated.
          Hide
          Binglin Chang added a comment -

          trunk patch

          Show
          Binglin Chang added a comment - trunk patch
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12481449/no-lock-localize-trunk.patch
          against trunk revision 1131265.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.cli.TestMRCLI

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/348//testReport/
          Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/348//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/348//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481449/no-lock-localize-trunk.patch against trunk revision 1131265. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/348//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/348//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/348//console This message is automatically generated.
          Hide
          Liyin Liang added a comment -

          I think this issue is the same with MAPREDUCE-2209.

          Show
          Liyin Liang added a comment - I think this issue is the same with MAPREDUCE-2209 .
          Hide
          Subroto Sanyal added a comment -

          Hi Devraj,
          MAPREDUCE-2209 also resolves the same issue. MAPREDUCE-2209 targets to solve one more thread blocking.
          Request you to look into MAPREDUCE-2209 patch. The patch provided in the issue is for 0.23 version.

          Show
          Subroto Sanyal added a comment - Hi Devraj, MAPREDUCE-2209 also resolves the same issue. MAPREDUCE-2209 targets to solve one more thread blocking. Request you to look into MAPREDUCE-2209 patch. The patch provided in the issue is for 0.23 version.
          Hide
          Devaraj Das added a comment -

          Subroto, I see a significant difference in the patches attached to MAPREDUCE-2209 and the last one here. I'll need to look at the details but if you have time could you please take a look at the patch attached here and see if this makes sense (given this patch predates the patch on MAPREDUCE-2209; I am sorry that I didn't look at the patch here earlier).

          Show
          Devaraj Das added a comment - Subroto, I see a significant difference in the patches attached to MAPREDUCE-2209 and the last one here. I'll need to look at the details but if you have time could you please take a look at the patch attached here and see if this makes sense (given this patch predates the patch on MAPREDUCE-2209 ; I am sorry that I didn't look at the patch here earlier).
          Hide
          Owen O'Malley added a comment -

          Hadoop 0.20.204.0 was just released.

          Show
          Owen O'Malley added a comment - Hadoop 0.20.204.0 was just released.

            People

            • Assignee:
              Devaraj Das
              Reporter:
              Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development