|
Attaching a patch the fixes this issue by moving a job to running state upon a setup success only if the job is in prep state. Result of test-patch
[exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no tests are needed for this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
Running ant test now. I just committed this. Thanks, Amar!
Devaraj,
Why did you commit this without a test or justification? Integrated in Hadoop-trunk #830 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/830/
. Prevents a job from going to RUNNING state after it has been KILLED (this used to happen when the SetupTask would come back with a success after the job has been killed). Contributed by Amar Kamat. Nigel,
Its not easy to write a test case for this. The situation is something like this :
The only hard part is to make the tracker with the setup return at the same time. Amar, so did manually test this or not test this fix? If manually tested, can you describe the manual test?
@Nigel : Karam tested this patch.
@Karam : can you please describe how you tested this patch? Submitted a job whose setup task run 3 mins.
When Setup task of jobs is running, go to TT on which setup task is running and suspend TT process. Issue hadoop job -kill Checked that job is moved killed state Resume TT (TT is process should be resumed at time setup task is complete ). Without 5636 patch applied -: Job is switched to running state. Job is not removed from capacity scheduler queue. When can see NullPointerException in JobTracker log on assignTask. No new job is scheduled With Job patch -: |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
JobTracker log for the job :
06:40:08,409 INFO org.apache.hadoop.mapred.JobHistory: Deleting job history file xyz
06:40:11,621 INFO org.apache.hadoop.mapred.JobTracker: Restoration complete
06:40:11,694 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200903310541_9080_m_000061_0' to tip task_200903310541_9080_m_000061, for tracker 'xxx'
06:40:11,737 INFO org.apache.hadoop.mapred.JobInProgress: Killingjob 'job_200903310541_9080'
06:40:11,748 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200903310541_9080_m_000060_0' to tip task_200903310541_9080_m_000060, for tracker 'xxxx'
06:40:11,750 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200903310541_9080_r_000035_0' to tip task_200903310541_9080_r_000035, for tracker 'xxxxxx'
06:40:11,803 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200903310541_9080_r_000047_0' to tip task_200903310541_9080_r_000047, for tracker 'xxxxxxxx'
.
.
.
all reducers are launched.
06:40:36,568 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_200903310541_9080_m_000060_0' has completed task_200903310541_9080_m_000060 successfully.
06:40:41,980 INFO org.apache.hadoop.mapred.JobHistory: Recovered job history filename for job job_200903310541_9080 is xyz
06:40:41,981 INFO org.apache.hadoop.mapred.JobHistory: Renaming xyz.recover to xyz
06:40:42,001 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200903310541_9080_m_000060_0' from 'xxx'
06:40:42,061 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200903310541_9080_r_000035_0' from 'xxxx'
06:40:42,073 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200903310541_9080_r_000047_0' from 'xxxxx'
06:40:42,256 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_200903310541_9080_m_000061_0' has completed task_200903310541_9080_m_000061 successfully.
06:40:42,263 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200903310541_9080_r_000002_0' to tip task_200903310541_9080_r_000002, for tracker xxx
06:41:26,579 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200903310541_9080_r_000002_0' from xxxx
06:40:42,271 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200903310541_9080_r_000043_0' from xxx
06:40:42,338 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200903310541_9080_r_000010_1' from xxxx
.
.
.