Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 0.23.3, 2.0.2-alpha
    • Component/s: mrv2, test
    • Labels:
      None

      Description

      The TestClusterMRNotification test is often timing out. git bisect tests narrowed it down to MAPREDUCE-3921, as the test consistently passes before that change and times out most of the time after picking up that change.

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #306 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/306/)
          svn merge -c 1355124 FIXES: MAPREDUCE-4376. TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1358418)

          Result = UNSTABLE
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358418
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #306 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/306/ ) svn merge -c 1355124 FIXES: MAPREDUCE-4376 . TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1358418) Result = UNSTABLE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358418 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1124 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1124/)
          MAPREDUCE-4376. TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1355124)

          Result = FAILURE
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355124
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1124 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1124/ ) MAPREDUCE-4376 . TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1355124) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355124 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1091 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1091/)
          MAPREDUCE-4376. TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1355124)

          Result = FAILURE
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355124
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1091 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1091/ ) MAPREDUCE-4376 . TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1355124) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355124 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2423 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2423/)
          MAPREDUCE-4376. TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1355124)

          Result = FAILURE
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355124
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2423 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2423/ ) MAPREDUCE-4376 . TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1355124) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355124 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2404 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2404/)
          MAPREDUCE-4376. TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1355124)

          Result = SUCCESS
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355124
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2404 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2404/ ) MAPREDUCE-4376 . TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1355124) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355124 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2472 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2472/)
          MAPREDUCE-4376. TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1355124)

          Result = SUCCESS
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355124
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2472 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2472/ ) MAPREDUCE-4376 . TestClusterMRNotification times out (Kihwal Lee via bobby) (Revision 1355124) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1355124 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/UtilsForTests.java
          Hide
          Robert Joseph Evans added a comment -

          Thanks Kihwal, I put this into trunk and branch-2

          Show
          Robert Joseph Evans added a comment - Thanks Kihwal, I put this into trunk and branch-2
          Hide
          Robert Joseph Evans added a comment -

          The changes look good to me. All of the changes are to test code, and Jenkins gave it a +1 so I give it a +1 too. Thanks for the fixes Kihwal I'll check them in.

          Show
          Robert Joseph Evans added a comment - The changes look good to me. All of the changes are to test code, and Jenkins gave it a +1 so I give it a +1 too. Thanks for the fixes Kihwal I'll check them in.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12533846/mapreduce-4376.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2526//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2526//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12533846/mapreduce-4376.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2526//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2526//console This message is automatically generated.
          Hide
          Kihwal Lee added a comment -
          • Also verified that the timeout works when the bug fix is missing.
          -------------------------------------------------------------------------------
          Test set: org.apache.hadoop.mapred.TestClusterMRNotification
          -------------------------------------------------------------------------------
          Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 77.437 sec <<< FAILURE!
          testMR(org.apache.hadoop.mapred.TestClusterMRNotification)  Time elapsed: 77.365 sec  <<< ERROR!
          java.io.IOException: Job cleanup didn't start in 30 seconds
                  at org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:676)
                  at org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:174)
          
          Show
          Kihwal Lee added a comment - Also verified that the timeout works when the bug fix is missing. ------------------------------------------------------------------------------- Test set: org.apache.hadoop.mapred.TestClusterMRNotification ------------------------------------------------------------------------------- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 77.437 sec <<< FAILURE! testMR(org.apache.hadoop.mapred.TestClusterMRNotification) Time elapsed: 77.365 sec <<< ERROR! java.io.IOException: Job cleanup didn't start in 30 seconds at org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:676) at org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:174)
          Hide
          Kihwal Lee added a comment -

          What this patch does:

          • Fixes the NPE bug in RMContainerAllocator.
          • Improves UtilsForTests by making the kill/fail job runner to timeout.
          • Improves NotificationTestCase by having it check for more failure conditions.

          TestJobHistory, TestJobInProgressListener and TestJobKillAndFail also call the kill/fail job runner in UtilsForTests. They were all tested okay with the new timeout.

          Show
          Kihwal Lee added a comment - What this patch does: Fixes the NPE bug in RMContainerAllocator . Improves UtilsForTests by making the kill/fail job runner to timeout. Improves NotificationTestCase by having it check for more failure conditions. TestJobHistory , TestJobInProgressListener and TestJobKillAndFail also call the kill/fail job runner in UtilsForTests . They were all tested okay with the new timeout.
          Hide
          Kihwal Lee added a comment -

          There is a check for null to handle transitions from UNASSIGNED state, but the check doesn't work anymore because assignedRequest.get() throws NPE after the following change from MAPREDUCE-3921.

               ContainerId get(TaskAttemptId tId) {
                 if (tId.getTaskId().getTaskType().equals(TaskType.MAP)) {
          -        return maps.get(tId);
          +        return maps.get(tId).getId();
                 } else {
          -        return reduces.get(tId);
          +        return reduces.get(tId).getId();
                 }
               }
          

          Jason has also suggested we put a time limit in these jobs so that they don't hang even if something goes wrong.

          Show
          Kihwal Lee added a comment - There is a check for null to handle transitions from UNASSIGNED state, but the check doesn't work anymore because assignedRequest.get() throws NPE after the following change from MAPREDUCE-3921 . ContainerId get(TaskAttemptId tId) { if (tId.getTaskId().getTaskType().equals(TaskType.MAP)) { - return maps.get(tId); + return maps.get(tId).getId(); } else { - return reduces.get(tId); + return reduces.get(tId).getId(); } } Jason has also suggested we put a time limit in these jobs so that they don't hang even if something goes wrong.
          Hide
          Kihwal Lee added a comment -

          Relevant log entries:

          2012-06-27 08:48:55,331 INFO [IPC Server handler 0 on 57856] org.apache.hadoop.mapreduce.v2.app.client.MRClie
          ntService: Kill Job received from client job_1340812108963_0002
          2012-06-27 08:48:55,332 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobI
          mpl: job_1340812108963_0002Job Transitioned from RUNNING to KILL_WAIT
          2012-06-27 08:48:55,332 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.Task
          Impl: task_1340812108963_0002_m_000000 Task Transitioned from SCHEDULED to KILL_WAIT
          2012-06-27 08:48:55,332 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.Task
          Impl: task_1340812108963_0002_m_000001 Task Transitioned from SCHEDULED to KILL_WAIT
          2012-06-27 08:48:55,333 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.Task
          Impl: task_1340812108963_0002_r_000000 Task Transitioned from SCHEDULED to KILL_WAIT
          2012-06-27 08:48:55,334 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.Task
          AttemptImpl: attempt_1340812108963_0002_m_000000_0 TaskAttempt Transitioned 
          from UNASSIGNED to KILLED
          2012-06-27 08:48:55,334 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1340812108963_0002_m_000001_0 TaskAttempt Transitioned from UNASSIGNED to KILLED
          2012-06-27 08:48:55,335 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1340812108963_0002_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to KILLED
          2012-06-27 08:48:55,335 INFO [Thread-45] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Processing the event EventType: CONTAINER_DEALLOCATE
          2012-06-27 08:48:55,338 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1340812108963_0002_m_000000 Task Transitioned from KILL_WAIT to KILLED
          2012-06-27 08:48:55,338 INFO [Thread-45] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Processing the event EventType: CONTAINER_DEALLOCATE
          2012-06-27 08:48:55,338 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1340812108963_0002_m_000001 Task Transitioned from KILL_WAIT to KILLED
          2012-06-27 08:48:55,338 INFO [Thread-45] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Processing the event EventType: CONTAINER_DEALLOCATE
          2012-06-27 08:48:55,339 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1340812108963_0002_r_000000 Task Transitioned from KILL_WAIT to KILLED
          2012-06-27 08:48:55,339 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
          2012-06-27 08:48:55,339 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2
          2012-06-27 08:48:55,340 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 3
          2012-06-27 08:48:55,341 ERROR [Thread-45] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Error in handling event type CONTAINER_DEALLOCATE to the ContainreAllocator
          java.lang.NullPointerException
                  at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$AssignedRequests.get(RMContainerAllocator.java:1103)
                  at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleEvent(RMContainerAllocator.java:339)
                  at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$1.run(RMContainerAllocator.java:191)
          2012-06-27 08:48:55,348 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1340812108963_0002Job Transitioned from KILL_WAIT to KILLED
          2012-06-27 08:48:55,348 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1340812108963_0002Job Transitioned from KILLED to ERROR
          

          The code assumes that if the attempt ID is not found in scheduledRequests, it will be in assignedRequests. But in this case, it was still in UNASSIGNED.

          Show
          Kihwal Lee added a comment - Relevant log entries: 2012-06-27 08:48:55,331 INFO [IPC Server handler 0 on 57856] org.apache.hadoop.mapreduce.v2.app.client.MRClie ntService: Kill Job received from client job_1340812108963_0002 2012-06-27 08:48:55,332 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobI mpl: job_1340812108963_0002Job Transitioned from RUNNING to KILL_WAIT 2012-06-27 08:48:55,332 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.Task Impl: task_1340812108963_0002_m_000000 Task Transitioned from SCHEDULED to KILL_WAIT 2012-06-27 08:48:55,332 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.Task Impl: task_1340812108963_0002_m_000001 Task Transitioned from SCHEDULED to KILL_WAIT 2012-06-27 08:48:55,333 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.Task Impl: task_1340812108963_0002_r_000000 Task Transitioned from SCHEDULED to KILL_WAIT 2012-06-27 08:48:55,334 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.Task AttemptImpl: attempt_1340812108963_0002_m_000000_0 TaskAttempt Transitioned from UNASSIGNED to KILLED 2012-06-27 08:48:55,334 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1340812108963_0002_m_000001_0 TaskAttempt Transitioned from UNASSIGNED to KILLED 2012-06-27 08:48:55,335 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1340812108963_0002_r_000000_0 TaskAttempt Transitioned from UNASSIGNED to KILLED 2012-06-27 08:48:55,335 INFO [Thread-45] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Processing the event EventType: CONTAINER_DEALLOCATE 2012-06-27 08:48:55,338 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1340812108963_0002_m_000000 Task Transitioned from KILL_WAIT to KILLED 2012-06-27 08:48:55,338 INFO [Thread-45] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Processing the event EventType: CONTAINER_DEALLOCATE 2012-06-27 08:48:55,338 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1340812108963_0002_m_000001 Task Transitioned from KILL_WAIT to KILLED 2012-06-27 08:48:55,338 INFO [Thread-45] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Processing the event EventType: CONTAINER_DEALLOCATE 2012-06-27 08:48:55,339 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1340812108963_0002_r_000000 Task Transitioned from KILL_WAIT to KILLED 2012-06-27 08:48:55,339 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1 2012-06-27 08:48:55,339 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2 2012-06-27 08:48:55,340 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 3 2012-06-27 08:48:55,341 ERROR [Thread-45] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Error in handling event type CONTAINER_DEALLOCATE to the ContainreAllocator java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$AssignedRequests.get(RMContainerAllocator.java:1103) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleEvent(RMContainerAllocator.java:339) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$1.run(RMContainerAllocator.java:191) 2012-06-27 08:48:55,348 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1340812108963_0002Job Transitioned from KILL_WAIT to KILLED 2012-06-27 08:48:55,348 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1340812108963_0002Job Transitioned from KILLED to ERROR The code assumes that if the attempt ID is not found in scheduledRequests, it will be in assignedRequests. But in this case, it was still in UNASSIGNED.
          Hide
          Kihwal Lee added a comment -

          It used to be

          job 1, SUCCEEDED, SUCCEEDED
          job 2, KILLED, KILLED
          job 3, FAILED, FAILED

          Now it's getting

          job 1, SUCCEEDED, SUCCEEDED
          job 2, ERROR, ERROR

          The test hangs after job 2.

          Show
          Kihwal Lee added a comment - It used to be job 1, SUCCEEDED, SUCCEEDED job 2, KILLED, KILLED job 3, FAILED, FAILED Now it's getting job 1, SUCCEEDED, SUCCEEDED job 2, ERROR, ERROR The test hangs after job 2.

            People

            • Assignee:
              Kihwal Lee
              Reporter:
              Jason Lowe
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development