Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Job initialization process was changed to not change (run) states during initialization. The reason is two fold
      - this can lead to deadlock as state changes require circular locking (i.e JobInProgress requires JobTracker lock)
      - events were not raised as these state changes were not informed/propogated back to the JobTracker

      Now the JobTracker takes care of initializing/failing/killing the job and raising appropriate events. The simple rule that was enforced was that "The JobTracker lock is *must* before changing the run-state of a job".
      Show
      Job initialization process was changed to not change (run) states during initialization. The reason is two fold - this can lead to deadlock as state changes require circular locking (i.e JobInProgress requires JobTracker lock) - events were not raised as these state changes were not informed/propogated back to the JobTracker Now the JobTracker takes care of initializing/failing/killing the job and raising appropriate events. The simple rule that was enforced was that "The JobTracker lock is *must* before changing the run-state of a job".

      Description

      We are running a hadoop cluster (version 0.20.0) and have detected the following deadlock on our jobtracker:

      "IPC Server handler 51 on 9001":
      	at org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
      	- waiting to lock <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
      	at org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
      	- locked <0x00007f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
      	at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
       "pool-1-thread-2":
      	at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
      	- waiting to lock <0x00007f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
      	at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
      	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
      	at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
      	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
      	at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
      	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
      	at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
      	- locked <0x00007f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
      	at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:619)
      
      1. MAPREDUCE-805-v1.1.patch
        8 kB
        Amar Kamat
      2. MAPREDUCE-805-v1.11.patch
        19 kB
        Amar Kamat
      3. MAPREDUCE-805-v1.11-branch-0.20.patch
        22 kB
        Amar Kamat
      4. MAPREDUCE-805-v1.12.patch
        19 kB
        Amar Kamat
      5. MAPREDUCE-805-v1.12-branch-0.20.patch
        22 kB
        Amar Kamat
      6. MAPREDUCE-805-v1.2.patch
        10 kB
        Amar Kamat
      7. MAPREDUCE-805-v1.3.patch
        10 kB
        Amar Kamat
      8. MAPREDUCE-805-v1.6.patch
        23 kB
        Amar Kamat
      9. MAPREDUCE-805-v1.7.patch
        23 kB
        Amar Kamat

        Issue Links

          Activity

          Michael Tamm created issue -
          Hide
          Michael Tamm added a comment -

          I just found the reason for the deadlock in the log file:

          2009-07-24 14:45:49,851 INFO org.apache.hadoop.mapred.EagerTaskInitializationListener: Initializing job_200907241445_0001
          2009-07-24 14:45:49,856 ERROR org.apache.hadoop.mapred.EagerTaskInitializationListener: Job initialization failed:
          java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 69
          jobtracker-dev.broadmail.staging_[0-9]+_job_200907241445_0001_optivo\michael.tamm_\QExtractRecipientIds: 75000000 - 85000000[1/1]\E+
                                                                               ^
          	at java.util.regex.Pattern.error(Pattern.java:1713)
          	at java.util.regex.Pattern.escape(Pattern.java:2177)
          	at java.util.regex.Pattern.atom(Pattern.java:1952)
          	at java.util.regex.Pattern.sequence(Pattern.java:1885)
          	at java.util.regex.Pattern.expr(Pattern.java:1752)
          	at java.util.regex.Pattern.compile(Pattern.java:1460)
          	at java.util.regex.Pattern.<init>(Pattern.java:1133)
          	at java.util.regex.Pattern.compile(Pattern.java:823)
          	at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:645)
          	at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:853)
          	at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:394)
          	at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          	at java.lang.Thread.run(Thread.java:619)
          
          Show
          Michael Tamm added a comment - I just found the reason for the deadlock in the log file: 2009-07-24 14:45:49,851 INFO org.apache.hadoop.mapred.EagerTaskInitializationListener: Initializing job_200907241445_0001 2009-07-24 14:45:49,856 ERROR org.apache.hadoop.mapred.EagerTaskInitializationListener: Job initialization failed: java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 69 jobtracker-dev.broadmail.staging_[0-9]+_job_200907241445_0001_optivo\michael.tamm_\QExtractRecipientIds: 75000000 - 85000000[1/1]\E+ ^ at java.util.regex.Pattern.error(Pattern.java:1713) at java.util.regex.Pattern.escape(Pattern.java:2177) at java.util.regex.Pattern.atom(Pattern.java:1952) at java.util.regex.Pattern.sequence(Pattern.java:1885) at java.util.regex.Pattern.expr(Pattern.java:1752) at java.util.regex.Pattern.compile(Pattern.java:1460) at java.util.regex.Pattern.<init>(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:823) at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:645) at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:853) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:394) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:81) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang. Thread .run( Thread .java:619)
          Hide
          Amar Kamat added a comment -

          Attaching a patch that should fix this. Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 12 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Running ant-tests.

          Show
          Amar Kamat added a comment - Attaching a patch that should fix this. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 12 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Running ant-tests.
          Amar Kamat made changes -
          Field Original Value New Value
          Attachment MAPREDUCE-805-v1.1.patch [ 12414606 ]
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Had a cursory look at the patch. It will be good to add javadoc for JobInProgress.initTasks() and JobInProgress.fail() mentioning that these methods ARE NOT supposed to be called directly by the schedulers and suggesting that the JobTracker methods be preferred to over JobInProgress methods for general use.

          Given this issue, it will also be helpful to document the locking order (JobTracker, JobInProgress) so that, for e.g, schedulers don't lock JobInProgress asynchronously before calling these methods.

          Though not directly related to the patch, it will be good to document that JobTracker is locked while calling JobInProgressListener update methods.

          Show
          Vinod Kumar Vavilapalli added a comment - Had a cursory look at the patch. It will be good to add javadoc for JobInProgress.initTasks() and JobInProgress.fail() mentioning that these methods ARE NOT supposed to be called directly by the schedulers and suggesting that the JobTracker methods be preferred to over JobInProgress methods for general use. Given this issue, it will also be helpful to document the locking order (JobTracker, JobInProgress) so that, for e.g, schedulers don't lock JobInProgress asynchronously before calling these methods. Though not directly related to the patch, it will be good to document that JobTracker is locked while calling JobInProgressListener update methods.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Ditto w.r.t the javadoc for the kill methods.

          Show
          Vinod Kumar Vavilapalli added a comment - Ditto w.r.t the javadoc for the kill methods.
          Hide
          Amar Kamat added a comment -

          Attaching a patch with updated javadoc and some minor fixes. Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 12 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Running ant-test now

          Show
          Amar Kamat added a comment - Attaching a patch with updated javadoc and some minor fixes. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 12 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Running ant-test now
          Amar Kamat made changes -
          Attachment MAPREDUCE-805-v1.2.patch [ 12414712 ]
          Hide
          Amar Kamat added a comment -

          ant tests passed on my box.

          Show
          Amar Kamat added a comment - ant tests passed on my box.
          Hide
          Amar Kamat added a comment -

          Somehow the dynamic scheduler didint show up in my list. Fixed that too.

          Show
          Amar Kamat added a comment - Somehow the dynamic scheduler didint show up in my list. Fixed that too.
          Amar Kamat made changes -
          Attachment MAPREDUCE-805-v1.3.patch [ 12414743 ]
          Hide
          Amar Kamat added a comment -

          All contrib tests except TestStreamingExitStatus passed.

          Show
          Amar Kamat added a comment - All contrib tests except TestStreamingExitStatus passed.
          Hide
          Amar Kamat added a comment -

          Attaching a patch that address Devaraj's offline review comments to do with other deadlock possibilities. Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 21 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Running ant tests now. Testing in progress.

          Show
          Amar Kamat added a comment - Attaching a patch that address Devaraj's offline review comments to do with other deadlock possibilities. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 21 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Running ant tests now. Testing in progress.
          Amar Kamat made changes -
          Attachment MAPREDUCE-805-v1.6.patch [ 12415736 ]
          Hide
          Amar Kamat added a comment -

          Test failed with 2 errors TestReduceFetch FAILED (timeout) and TestTaskTrackerMemoryManager FAILED. Doesnt seem related but will debug.

          Show
          Amar Kamat added a comment - Test failed with 2 errors TestReduceFetch FAILED (timeout) and TestTaskTrackerMemoryManager FAILED. Doesnt seem related but will debug.
          Hide
          Amar Kamat added a comment -

          TestReduceFetch and TestTaskTrackerMemoryManager failures are known issues. contrib tests passed on box.

          Show
          Amar Kamat added a comment - TestReduceFetch and TestTaskTrackerMemoryManager failures are known issues. contrib tests passed on box.
          Hide
          Amar Kamat added a comment -

          Attaching a patch incorporating Devaraj's offline comments. Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 21 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          Amar Kamat added a comment - Attaching a patch incorporating Devaraj's offline comments. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 21 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Amar Kamat made changes -
          Attachment MAPREDUCE-805-v1.7.patch [ 12415849 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          You can seperate completeEmptyJob() code from setupComplete and make JT call only completeEmptyJob() under the lock. setupComplete can be part of initTasks() itself.

          Show
          Amareshwari Sriramadasu added a comment - You can seperate completeEmptyJob() code from setupComplete and make JT call only completeEmptyJob() under the lock. setupComplete can be part of initTasks() itself.
          Hide
          Amar Kamat added a comment -

          Attaching a new patch with some bug fixes. Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 18 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          All tests except TestReduceFetch and TestJobTrackerRestartWithLostTracker passed on my box. Rerun of TestJobTrackerRestartWithLostTracker passed. TestReduceFetch is a known issue.

          Show
          Amar Kamat added a comment - Attaching a new patch with some bug fixes. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 18 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. All tests except TestReduceFetch and TestJobTrackerRestartWithLostTracker passed on my box. Rerun of TestJobTrackerRestartWithLostTracker passed. TestReduceFetch is a known issue.
          Amar Kamat made changes -
          Attachment MAPREDUCE-805-v1.11.patch [ 12416056 ]
          Hide
          Amar Kamat added a comment -

          patch for branch 0.20

          Show
          Amar Kamat added a comment - patch for branch 0.20
          Amar Kamat made changes -
          Attachment MAPREDUCE-805-v1.11-branch-0.20.patch [ 12416059 ]
          Hide
          Amar Kamat added a comment -

          ant tests on branch-0.20 failed on
          hdfs.TestDistributedFileSystem, namenode.TestStartup and TestReduceFetch. Contrib tests failed on TestStreamingExitStatus.

          Show
          Amar Kamat added a comment - ant tests on branch-0.20 failed on hdfs.TestDistributedFileSystem, namenode.TestStartup and TestReduceFetch. Contrib tests failed on TestStreamingExitStatus.
          Hide
          Amareshwari Sriramadasu added a comment -

          Changes look fine to me

          Show
          Amareshwari Sriramadasu added a comment - Changes look fine to me
          Hide
          Amar Kamat added a comment -

          We cannot test the deadlock code but tested the code that has changed i.e job-init, job-kill, empty-job, job-with-no-setup-cleanup and job with 0-maps/reduces.

          job-empty? setup-cleanup-required? killed in init? result
          yes yes yes pass (job killed)
          yes yes no pass (setup-cleanup launched and job succeeded)
          yes no yes pass (job killed)
          yes no no pass (job marked succeeded in JobTracker.initJob())
          no yes yes pass (job killed after init)
          no yes no pass (job runs to completion)
          no no yes pass (job killed after init)
          no no no pass (job runs to completion)

          Did I miss anything?

          Show
          Amar Kamat added a comment - We cannot test the deadlock code but tested the code that has changed i.e job-init, job-kill, empty-job, job-with-no-setup-cleanup and job with 0-maps/reduces. job-empty? setup-cleanup-required? killed in init? result yes yes yes pass (job killed) yes yes no pass (setup-cleanup launched and job succeeded) yes no yes pass (job killed) yes no no pass (job marked succeeded in JobTracker.initJob()) no yes yes pass (job killed after init) no yes no pass (job runs to completion) no no yes pass (job killed after init) no no no pass (job runs to completion) Did I miss anything?
          Hide
          Amar Kamat added a comment -

          Note that I purposefully added sleeps in JobTracker.initJob() and JobInProgress.initTasks to take care of race conditions. I didnt see any side effect. With this patch init will always keep the job in PREP state but based on whether

          • setup is required or not
          • tasks are needed to run
          • job-kill was issued during init
          • job-init failed

          the job can move to RUNNING or SUCCCEEDED or KILLED or FAILED state or remain in PREP state. Here is how the state transition happens (note that after job.initTasks() the job will be in PREP state)

          setup needed? maps=0 and reduces=0? job killed during init? init failed? new state comment
          * * * yes FAILED irrespective of what the config is, if the job fails in init, its marked as FAILED
          * * yes no KILLED irrespective of what the config is, if the job is killed during init and init passed normally then the job is marked as KILLED
          yes * no no PREP if job is configured to run setup then the job will remain in PREP state
          no yes no no SUCCEEDED if the job has no setup configured and if there are no maps and reduces then the job is marked SUCCEEDED
          no no no no RUNNING if the job has no setup configured and if there are maps and reduces then the job is marked RUNNING
          Show
          Amar Kamat added a comment - Note that I purposefully added sleeps in JobTracker.initJob() and JobInProgress.initTasks to take care of race conditions. I didnt see any side effect. With this patch init will always keep the job in PREP state but based on whether setup is required or not tasks are needed to run job-kill was issued during init job-init failed the job can move to RUNNING or SUCCCEEDED or KILLED or FAILED state or remain in PREP state. Here is how the state transition happens (note that after job.initTasks() the job will be in PREP state) setup needed? maps=0 and reduces=0? job killed during init? init failed? new state comment * * * yes FAILED irrespective of what the config is, if the job fails in init, its marked as FAILED * * yes no KILLED irrespective of what the config is, if the job is killed during init and init passed normally then the job is marked as KILLED yes * no no PREP if job is configured to run setup then the job will remain in PREP state no yes no no SUCCEEDED if the job has no setup configured and if there are no maps and reduces then the job is marked SUCCEEDED no no no no RUNNING if the job has no setup configured and if there are maps and reduces then the job is marked RUNNING
          Hide
          Amar Kamat added a comment -

          Note that I purposefully added sleeps in JobTracker.initJob() and JobInProgress.initTasks to take care of race conditions.

          I meant during testing, my bad.

          Show
          Amar Kamat added a comment - Note that I purposefully added sleeps in JobTracker.initJob() and JobInProgress.initTasks to take care of race conditions. I meant during testing, my bad.
          Vinod Kumar Vavilapalli made changes -
          Link This issue relates to MAPREDUCE-802 [ MAPREDUCE-802 ]
          Hide
          Vinod Kumar Vavilapalli added a comment -

          This issue clashes with the changes in MAPREDUCE-802. The two issues should be coordinated.

          Show
          Vinod Kumar Vavilapalli added a comment - This issue clashes with the changes in MAPREDUCE-802 . The two issues should be coordinated.
          Hide
          Amar Kamat added a comment -

          Attaching a patch with extra log info during job-kill. I tested the patch for 20 and it works as expected. Killed the job during init and the job was killed. Job init failure is handled as expected. Tested with capacity scheduler to see if JobTracker.failJob() raises events as expected.

          Show
          Amar Kamat added a comment - Attaching a patch with extra log info during job-kill. I tested the patch for 20 and it works as expected. Killed the job during init and the job was killed. Job init failure is handled as expected. Tested with capacity scheduler to see if JobTracker.failJob() raises events as expected.
          Amar Kamat made changes -
          Attachment MAPREDUCE-805-v1.12.patch [ 12416175 ]
          Attachment MAPREDUCE-805-v1.12-branch-0.20.patch [ 12416176 ]
          Amar Kamat made changes -
          Attachment MAPREDUCE-805-v1.12.patch [ 12416175 ]
          Amar Kamat made changes -
          Attachment MAPREDUCE-805-v1.12.patch [ 12416177 ]
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Amar!

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Amar!
          Devaraj Das made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.20.1 [ 12314047 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #46 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #46 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/ )
          Vinod Kumar Vavilapalli made changes -
          Link This issue duplicates MAPREDUCE-27 [ MAPREDUCE-27 ]
          Amar Kamat made changes -
          Release Note Job initialization process was changed to not change (run) states during initialization. The reason is two fold
          - this can lead to deadlock as state changes require circular locking (i.e JobInProgress requires JobTracker lock)
          - events were not raised as these state changes were not informed/propogated back to the JobTracker

          Now the JobTracker takes care of initializing/failing/killing the job and raising appropriate events. The simple rule that was enforced was that "The JobTracker lock is *must* before changing the run-state of a job".
          Amar Kamat made changes -
          Assignee Amar Kamat [ amar_kamat ]

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Michael Tamm
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development