Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0, 0.22.0, 0.23.0
    • Fix Version/s: 0.20.204.0, 0.23.0
    • Component/s: tasktracker
    • Labels:
      None

      Description

      Currently TT doens't check to ensure jvmid is relevant during communication with the Child via TaskUmbilicalProtocol.

      1. MR2429_trunk.patch
        43 kB
        Siddharth Seth
      2. MR2429.patch
        40 kB
        Siddharth Seth
      3. MR2429-1.patch
        39 kB
        Arun C Murthy

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          54m 12s 1 Siddharth Seth 09/Apr/11 00:31
          Patch Available Patch Available Open Open
          1d 19h 50m 1 Arun C Murthy 10/Apr/11 20:22
          Open Open Resolved Resolved
          1d 5h 52m 1 Arun C Murthy 12/Apr/11 02:15
          Resolved Resolved Reopened Reopened
          21d 20h 50m 1 Arun C Murthy 03/May/11 23:05
          Reopened Reopened Patch Available Patch Available
          41d 19h 53m 1 Siddharth Seth 14/Jun/11 18:59
          Patch Available Patch Available Resolved Resolved
          72d 2h 13m 1 Owen O'Malley 25/Aug/11 21:12
          Resolved Resolved Closed Closed
          8d 2h 1m 1 Owen O'Malley 02/Sep/11 23:13
          Gavin made changes -
          Link This issue is depended upon by MAPREDUCE-2555 [ MAPREDUCE-2555 ]
          Gavin made changes -
          Link This issue blocks MAPREDUCE-2555 [ MAPREDUCE-2555 ]
          Owen O'Malley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Owen O'Malley added a comment -

          Hadoop 0.20.204.0 was just released.

          Show
          Owen O'Malley added a comment - Hadoop 0.20.204.0 was just released.
          Owen O'Malley made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Siddharth Seth added a comment -

          I have a partial patch for the MR-279 branch. trunk has some additional changes which the patch uses. Instead of incorporating bits from other patches into mr-279, will just work on a combined trunk/mr-279 patch once the merge happens.

          Show
          Siddharth Seth added a comment - I have a partial patch for the MR-279 branch. trunk has some additional changes which the patch uses. Instead of incorporating bits from other patches into mr-279, will just work on a combined trunk/mr-279 patch once the merge happens.
          Hide
          Todd Lipcon added a comment -

          ah, thanks Sid

          Show
          Todd Lipcon added a comment - ah, thanks Sid
          Hide
          Siddharth Seth added a comment -

          MR2447 set the jvmContext earlier in the 20s branch (0.20.204).

          Show
          Siddharth Seth added a comment - MR2447 set the jvmContext earlier in the 20s branch (0.20.204).
          Siddharth Seth made changes -
          Link This issue is related to MAPREDUCE-2447 [ MAPREDUCE-2447 ]
          Hide
          Todd Lipcon added a comment -

          The version of this patch on 0.20-security seems to have a bug in that it doesn't call task.setJvmContext() until after the syncLogs call, etc. Thus, if syncLogs fails, jvmContext is still null, and the status update will have null Jvmcontext, causing NPE. The patch against trunk seems to have fixed this by calling setJvmContext at the top.

          Show
          Todd Lipcon added a comment - The version of this patch on 0.20-security seems to have a bug in that it doesn't call task.setJvmContext() until after the syncLogs call, etc. Thus, if syncLogs fails, jvmContext is still null, and the status update will have null Jvmcontext, causing NPE. The patch against trunk seems to have fixed this by calling setJvmContext at the top.
          Hide
          Siddharth Seth added a comment -

          The test failures should not be related ... cannot access the build system right now to verify. A local ant test-patch was successful.
          The trunk patch includes MR2443, MR2555 and MR2447
          Will work on a patch for the MR-279 branch. Thanks for the input Sharad.

          Show
          Siddharth Seth added a comment - The test failures should not be related ... cannot access the build system right now to verify. A local ant test-patch was successful. The trunk patch includes MR2443, MR2555 and MR2447 Will work on a patch for the MR-279 branch. Thanks for the input Sharad.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12482514/MR2429_trunk.patch
          against trunk revision 1135462.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.cli.TestMRCLI
          org.apache.hadoop.fs.TestFileSystem

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482514/MR2429_trunk.patch against trunk revision 1135462. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI org.apache.hadoop.fs.TestFileSystem -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/393//console This message is automatically generated.
          Siddharth Seth made changes -
          Status Reopened [ 4 ] Patch Available [ 10002 ]
          Hide
          Sharad Agarwal added a comment -

          The minor fix required in TaskAttemptImpl state machine is to ignore the TA_DONE event in SUCCEEDED state.

          Correction: ignore the TA_UPDATE event in SUCCEEDED state.

          Show
          Sharad Agarwal added a comment - The minor fix required in TaskAttemptImpl state machine is to ignore the TA_DONE event in SUCCEEDED state. Correction: ignore the TA_UPDATE event in SUCCEEDED state.
          Hide
          Sharad Agarwal added a comment -

          In MR AM, getting status updates from a Task even after DONE call is ok. The minor fix required in TaskAttemptImpl state machine is to ignore the TA_DONE event in SUCCEEDED state. This will lead to ignoring of status updates from a task once it sends the done.

          Show
          Sharad Agarwal added a comment - In MR AM, getting status updates from a Task even after DONE call is ok. The minor fix required in TaskAttemptImpl state machine is to ignore the TA_DONE event in SUCCEEDED state. This will lead to ignoring of status updates from a task once it sends the done.
          Hide
          Arun C Murthy added a comment -

          Sid, please ensure that this patch is committed to MR-279 too. This will mean some changes to the MR AM, so sync with Sharad. Thanks!

          Show
          Arun C Murthy added a comment - Sid, please ensure that this patch is committed to MR-279 too. This will mean some changes to the MR AM, so sync with Sharad. Thanks!
          Hide
          Siddharth Seth added a comment -

          A little more context on the reason for the patch.

          If a task is Killed (speculation or other reasons), there's a possibility that the TaskTracker will end up losing track of the slot(s) used by the task. The jobTracker will continue assigning tasks to the TT though.

          The attemptId is re-used for cleanup tasks, there's a race where if there is a delay in processing the SIGTERM sent to the child, the TT will allocate slots for the CLEANUP task but will not actually launch it and release the slots. (The old task sends in a done / commitPending which is applied to the TIP of the cleanup task).

          The patch validates the caller jvm - for all calls in TaskUmbilicalProtocol.

          Show
          Siddharth Seth added a comment - A little more context on the reason for the patch. If a task is Killed (speculation or other reasons), there's a possibility that the TaskTracker will end up losing track of the slot(s) used by the task. The jobTracker will continue assigning tasks to the TT though. The attemptId is re-used for cleanup tasks, there's a race where if there is a delay in processing the SIGTERM sent to the child, the TT will allocate slots for the CLEANUP task but will not actually launch it and release the slots. (The old task sends in a done / commitPending which is applied to the TIP of the cleanup task). The patch validates the caller jvm - for all calls in TaskUmbilicalProtocol.
          Siddharth Seth made changes -
          Attachment MR2429_trunk.patch [ 12482514 ]
          Hide
          Siddharth Seth added a comment -

          Patch for trunk.

          Show
          Siddharth Seth added a comment - Patch for trunk.
          Siddharth Seth made changes -
          Fix Version/s 0.23.0 [ 12315570 ]
          Affects Version/s 0.21.0 [ 12314045 ]
          Affects Version/s 0.22.0 [ 12314184 ]
          Affects Version/s 0.23.0 [ 12315570 ]
          Chris Douglas made changes -
          Link This issue blocks MAPREDUCE-2555 [ MAPREDUCE-2555 ]
          Hide
          Thomas Graves added a comment -

          Note that MAPREDUCE-2555 was filed to fix extra log messages and kill child off if they do happen to run after the call to done. That should be pulled into trunk once this has a patch for trunk.

          Show
          Thomas Graves added a comment - Note that MAPREDUCE-2555 was filed to fix extra log messages and kill child off if they do happen to run after the call to done. That should be pulled into trunk once this has a patch for trunk.
          Arun C Murthy made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Arun C Murthy added a comment -

          Sorry, my bad for closing this. Sid is working on porting this to trunk. Thanks for pointing it out.

          Show
          Arun C Murthy added a comment - Sorry, my bad for closing this. Sid is working on porting this to trunk. Thanks for pointing it out.
          Hide
          Todd Lipcon added a comment -

          Does this not need to go into trunk? From looking at the code I see no reason that trunk wouldn't also be affected.

          Show
          Todd Lipcon added a comment - Does this not need to go into trunk? From looking at the code I see no reason that trunk wouldn't also be affected.
          Tsz Wo Nicholas Sze made changes -
          Link This issue is related to MAPREDUCE-2443 [ MAPREDUCE-2443 ]
          Arun C Murthy made changes -
          Release Note I just committed this. Thanks Sid!
          Hide
          Arun C Murthy added a comment -

          I just committed this. Thanks Sid!

          Show
          Arun C Murthy added a comment - I just committed this. Thanks Sid!
          Arun C Murthy made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Release Note I just committed this. Thanks Sid!
          Fix Version/s 0.20.204.0 [ 12316318 ]
          Resolution Fixed [ 1 ]
          Hide
          Siddharth Seth added a comment -

          There's some tests failing with and without the changes. The only difference is TestJvmReuse which failed with a complete 'ant run' after the changes.
          Running just this one test (with the changes ant test -Dtestcase=TestJvmReuse) passes though.

          Show
          Siddharth Seth added a comment - There's some tests failing with and without the changes. The only difference is TestJvmReuse which failed with a complete 'ant run' after the changes. Running just this one test (with the changes ant test -Dtestcase=TestJvmReuse) passes though.
          Hide
          Siddharth Seth added a comment -

          There are some test case failures - taking a look at those (specifically TestJvmReuse which will likely need to change).
          TT.commitPending cheks the JVM context via TT.statusUpdate (the stack trace in the logs should be adequate to give info on the call). WIll remove the TODO statement.

          Show
          Siddharth Seth added a comment - There are some test case failures - taking a look at those (specifically TestJvmReuse which will likely need to change). TT.commitPending cheks the JVM context via TT.statusUpdate (the stack trace in the logs should be adequate to give info on the call). WIll remove the TODO statement.
          Arun C Murthy made changes -
          Attachment MR2429-1.patch [ 12475976 ]
          Hide
          Arun C Murthy added a comment -

          Updated to fix some indentation nits. I'm starting a run of 'ant test' now.

          The only remaining question: TT.commitPending doesn't call validateJVM, I don't see why not. Sid?

          Show
          Arun C Murthy added a comment - Updated to fix some indentation nits. I'm starting a run of 'ant test' now. The only remaining question: TT.commitPending doesn't call validateJVM, I don't see why not. Sid?
          Hide
          Arun C Murthy added a comment -

          Hmm... TT.commitPending doesn't check for validity of the JVMContext, is this on purpose?

          Show
          Arun C Murthy added a comment - Hmm... TT.commitPending doesn't check for validity of the JVMContext, is this on purpose?
          Hide
          Arun C Murthy added a comment -

          Also, I see some TODOs, can we remove them?

          Show
          Arun C Murthy added a comment - Also, I see some TODOs, can we remove them?
          Hide
          Arun C Murthy added a comment -

          Some more:

          1. Indentation needed for some changes to keep below 80-char limit.
          2. TaskTracker.logJvmValidateFailed is unused.
          Show
          Arun C Murthy added a comment - Some more: Indentation needed for some changes to keep below 80-char limit. TaskTracker.logJvmValidateFailed is unused.
          Arun C Murthy made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Arun C Murthy added a comment -

          Sid, the patch looks good. I'll commit when you confirm all unit tests pass.

          Some nits:

          @@ -346,8 +346,9 @@ class ReduceTask extends Task {
          @Override
          @SuppressWarnings("unchecked")
          public void run(JobConf job, final TaskUmbilicalProtocol umbilical)

          • throws IOException, InterruptedException, ClassNotFoundException {
            + throws IOException, InterruptedException, ClassNotFoundException {
            this.umbilical = umbilical;
            + this.jvmContext = jvmContext;
            job.setBoolean("mapred.skip.on", isSkipping());

          if (isMapOrReduce()) {

          That is a unnecessary change - indentation and assignment of a variable to itself.

          There are some more unnecessary indentation changes, bah!

          Show
          Arun C Murthy added a comment - Sid, the patch looks good. I'll commit when you confirm all unit tests pass. Some nits: @@ -346,8 +346,9 @@ class ReduceTask extends Task { @Override @SuppressWarnings("unchecked") public void run(JobConf job, final TaskUmbilicalProtocol umbilical) throws IOException, InterruptedException, ClassNotFoundException { + throws IOException, InterruptedException, ClassNotFoundException { this.umbilical = umbilical; + this.jvmContext = jvmContext; job.setBoolean("mapred.skip.on", isSkipping()); if (isMapOrReduce()) { That is a unnecessary change - indentation and assignment of a variable to itself. There are some more unnecessary indentation changes, bah!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12475868/MR2429.patch
          against trunk revision 1090390.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/161//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12475868/MR2429.patch against trunk revision 1090390. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/161//console This message is automatically generated.
          Siddharth Seth made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Siddharth Seth added a comment -

          TaskUmbilicalProtocol changed to include JvmContext as part of most calls. Validates the jvm making the request.

          Show
          Siddharth Seth added a comment - TaskUmbilicalProtocol changed to include JvmContext as part of most calls. Validates the jvm making the request.
          Siddharth Seth made changes -
          Field Original Value New Value
          Attachment MR2429.patch [ 12475868 ]
          Arun C Murthy created issue -

            People

            • Assignee:
              Siddharth Seth
              Reporter:
              Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development