Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0, 0.22.0, 0.23.0
    • Fix Version/s: 0.20.204.0, 0.23.0
    • Component/s: tasktracker
    • Labels:
      None

      Description

      Currently TT doens't check to ensure jvmid is relevant during communication with the Child via TaskUmbilicalProtocol.

      1. MR2429_trunk.patch
        43 kB
        Siddharth Seth
      2. MR2429.patch
        40 kB
        Siddharth Seth
      3. MR2429-1.patch
        39 kB
        Arun C Murthy

        Issue Links

          Activity

          Hide
          Owen O'Malley added a comment -

          Hadoop 0.20.204.0 was just released.

          Show
          Owen O'Malley added a comment - Hadoop 0.20.204.0 was just released.
          Hide
          Siddharth Seth added a comment -

          I have a partial patch for the MR-279 branch. trunk has some additional changes which the patch uses. Instead of incorporating bits from other patches into mr-279, will just work on a combined trunk/mr-279 patch once the merge happens.

          Show
          Siddharth Seth added a comment - I have a partial patch for the MR-279 branch. trunk has some additional changes which the patch uses. Instead of incorporating bits from other patches into mr-279, will just work on a combined trunk/mr-279 patch once the merge happens.
          Hide
          Todd Lipcon added a comment -

          ah, thanks Sid

          Show
          Todd Lipcon added a comment - ah, thanks Sid
          Hide
          Siddharth Seth added a comment -

          MR2447 set the jvmContext earlier in the 20s branch (0.20.204).

          Show
          Siddharth Seth added a comment - MR2447 set the jvmContext earlier in the 20s branch (0.20.204).
          Hide
          Todd Lipcon added a comment -

          The version of this patch on 0.20-security seems to have a bug in that it doesn't call task.setJvmContext() until after the syncLogs call, etc. Thus, if syncLogs fails, jvmContext is still null, and the status update will have null Jvmcontext, causing NPE. The patch against trunk seems to have fixed this by calling setJvmContext at the top.

          Show
          Todd Lipcon added a comment - The version of this patch on 0.20-security seems to have a bug in that it doesn't call task.setJvmContext() until after the syncLogs call, etc. Thus, if syncLogs fails, jvmContext is still null, and the status update will have null Jvmcontext, causing NPE. The patch against trunk seems to have fixed this by calling setJvmContext at the top.
          Hide
          Siddharth Seth added a comment -

          The test failures should not be related ... cannot access the build system right now to verify. A local ant test-patch was successful.
          The trunk patch includes MR2443, MR2555 and MR2447
          Will work on a patch for the MR-279 branch. Thanks for the input Sharad.

          Show
          Siddharth Seth added a comment - The test failures should not be related ... cannot access the build system right now to verify. A local ant test-patch was successful. The trunk patch includes MR2443, MR2555 and MR2447 Will work on a patch for the MR-279 branch. Thanks for the input Sharad.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12482514/MR2429_trunk.patch
          against trunk revision 1135462.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.cli.TestMRCLI
          org.apache.hadoop.fs.TestFileSystem

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482514/MR2429_trunk.patch against trunk revision 1135462. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI org.apache.hadoop.fs.TestFileSystem -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          The minor fix required in TaskAttemptImpl state machine is to ignore the TA_DONE event in SUCCEEDED state.

          Correction: ignore the TA_UPDATE event in SUCCEEDED state.

          Show
          Sharad Agarwal added a comment - The minor fix required in TaskAttemptImpl state machine is to ignore the TA_DONE event in SUCCEEDED state. Correction: ignore the TA_UPDATE event in SUCCEEDED state.
          Hide
          Sharad Agarwal added a comment -

          In MR AM, getting status updates from a Task even after DONE call is ok. The minor fix required in TaskAttemptImpl state machine is to ignore the TA_DONE event in SUCCEEDED state. This will lead to ignoring of status updates from a task once it sends the done.

          Show
          Sharad Agarwal added a comment - In MR AM, getting status updates from a Task even after DONE call is ok. The minor fix required in TaskAttemptImpl state machine is to ignore the TA_DONE event in SUCCEEDED state. This will lead to ignoring of status updates from a task once it sends the done.
          Hide
          Arun C Murthy added a comment -

          Sid, please ensure that this patch is committed to MR-279 too. This will mean some changes to the MR AM, so sync with Sharad. Thanks!

          Show
          Arun C Murthy added a comment - Sid, please ensure that this patch is committed to MR-279 too. This will mean some changes to the MR AM, so sync with Sharad. Thanks!
          Hide
          Siddharth Seth added a comment -

          A little more context on the reason for the patch.

          If a task is Killed (speculation or other reasons), there's a possibility that the TaskTracker will end up losing track of the slot(s) used by the task. The jobTracker will continue assigning tasks to the TT though.

          The attemptId is re-used for cleanup tasks, there's a race where if there is a delay in processing the SIGTERM sent to the child, the TT will allocate slots for the CLEANUP task but will not actually launch it and release the slots. (The old task sends in a done / commitPending which is applied to the TIP of the cleanup task).

          The patch validates the caller jvm - for all calls in TaskUmbilicalProtocol.

          Show
          Siddharth Seth added a comment - A little more context on the reason for the patch. If a task is Killed (speculation or other reasons), there's a possibility that the TaskTracker will end up losing track of the slot(s) used by the task. The jobTracker will continue assigning tasks to the TT though. The attemptId is re-used for cleanup tasks, there's a race where if there is a delay in processing the SIGTERM sent to the child, the TT will allocate slots for the CLEANUP task but will not actually launch it and release the slots. (The old task sends in a done / commitPending which is applied to the TIP of the cleanup task). The patch validates the caller jvm - for all calls in TaskUmbilicalProtocol.
          Hide
          Siddharth Seth added a comment -

          Patch for trunk.

          Show
          Siddharth Seth added a comment - Patch for trunk.
          Hide
          Thomas Graves added a comment -

          Note that MAPREDUCE-2555 was filed to fix extra log messages and kill child off if they do happen to run after the call to done. That should be pulled into trunk once this has a patch for trunk.

          Show
          Thomas Graves added a comment - Note that MAPREDUCE-2555 was filed to fix extra log messages and kill child off if they do happen to run after the call to done. That should be pulled into trunk once this has a patch for trunk.
          Hide
          Arun C Murthy added a comment -

          Sorry, my bad for closing this. Sid is working on porting this to trunk. Thanks for pointing it out.

          Show
          Arun C Murthy added a comment - Sorry, my bad for closing this. Sid is working on porting this to trunk. Thanks for pointing it out.
          Hide
          Todd Lipcon added a comment -

          Does this not need to go into trunk? From looking at the code I see no reason that trunk wouldn't also be affected.

          Show
          Todd Lipcon added a comment - Does this not need to go into trunk? From looking at the code I see no reason that trunk wouldn't also be affected.
          Hide
          Arun C Murthy added a comment -

          I just committed this. Thanks Sid!

          Show
          Arun C Murthy added a comment - I just committed this. Thanks Sid!
          Hide
          Siddharth Seth added a comment -

          There's some tests failing with and without the changes. The only difference is TestJvmReuse which failed with a complete 'ant run' after the changes.
          Running just this one test (with the changes ant test -Dtestcase=TestJvmReuse) passes though.

          Show
          Siddharth Seth added a comment - There's some tests failing with and without the changes. The only difference is TestJvmReuse which failed with a complete 'ant run' after the changes. Running just this one test (with the changes ant test -Dtestcase=TestJvmReuse) passes though.
          Hide
          Siddharth Seth added a comment -

          There are some test case failures - taking a look at those (specifically TestJvmReuse which will likely need to change).
          TT.commitPending cheks the JVM context via TT.statusUpdate (the stack trace in the logs should be adequate to give info on the call). WIll remove the TODO statement.

          Show
          Siddharth Seth added a comment - There are some test case failures - taking a look at those (specifically TestJvmReuse which will likely need to change). TT.commitPending cheks the JVM context via TT.statusUpdate (the stack trace in the logs should be adequate to give info on the call). WIll remove the TODO statement.
          Hide
          Arun C Murthy added a comment -

          Updated to fix some indentation nits. I'm starting a run of 'ant test' now.

          The only remaining question: TT.commitPending doesn't call validateJVM, I don't see why not. Sid?

          Show
          Arun C Murthy added a comment - Updated to fix some indentation nits. I'm starting a run of 'ant test' now. The only remaining question: TT.commitPending doesn't call validateJVM, I don't see why not. Sid?
          Hide
          Arun C Murthy added a comment -

          Hmm... TT.commitPending doesn't check for validity of the JVMContext, is this on purpose?

          Show
          Arun C Murthy added a comment - Hmm... TT.commitPending doesn't check for validity of the JVMContext, is this on purpose?
          Hide
          Arun C Murthy added a comment -

          Also, I see some TODOs, can we remove them?

          Show
          Arun C Murthy added a comment - Also, I see some TODOs, can we remove them?
          Hide
          Arun C Murthy added a comment -

          Some more:

          1. Indentation needed for some changes to keep below 80-char limit.
          2. TaskTracker.logJvmValidateFailed is unused.
          Show
          Arun C Murthy added a comment - Some more: Indentation needed for some changes to keep below 80-char limit. TaskTracker.logJvmValidateFailed is unused.
          Hide
          Arun C Murthy added a comment -

          Sid, the patch looks good. I'll commit when you confirm all unit tests pass.

          Some nits:

          @@ -346,8 +346,9 @@ class ReduceTask extends Task {
          @Override
          @SuppressWarnings("unchecked")
          public void run(JobConf job, final TaskUmbilicalProtocol umbilical)

          • throws IOException, InterruptedException, ClassNotFoundException {
            + throws IOException, InterruptedException, ClassNotFoundException {
            this.umbilical = umbilical;
            + this.jvmContext = jvmContext;
            job.setBoolean("mapred.skip.on", isSkipping());

          if (isMapOrReduce()) {

          That is a unnecessary change - indentation and assignment of a variable to itself.

          There are some more unnecessary indentation changes, bah!

          Show
          Arun C Murthy added a comment - Sid, the patch looks good. I'll commit when you confirm all unit tests pass. Some nits: @@ -346,8 +346,9 @@ class ReduceTask extends Task { @Override @SuppressWarnings("unchecked") public void run(JobConf job, final TaskUmbilicalProtocol umbilical) throws IOException, InterruptedException, ClassNotFoundException { + throws IOException, InterruptedException, ClassNotFoundException { this.umbilical = umbilical; + this.jvmContext = jvmContext; job.setBoolean("mapred.skip.on", isSkipping()); if (isMapOrReduce()) { That is a unnecessary change - indentation and assignment of a variable to itself. There are some more unnecessary indentation changes, bah!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475868/MR2429.patch
          against trunk revision 1090390.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/161//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475868/MR2429.patch against trunk revision 1090390. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/161//console This message is automatically generated.
          Hide
          Siddharth Seth added a comment -

          TaskUmbilicalProtocol changed to include JvmContext as part of most calls. Validates the jvm making the request.

          Show
          Siddharth Seth added a comment - TaskUmbilicalProtocol changed to include JvmContext as part of most calls. Validates the jvm making the request.

            People

            • Assignee:
              Siddharth Seth
              Reporter:
              Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development