|
[
Permlink
| « Hide
]
Sreekanth Ramakrishnan added a comment - 22/Apr/09 07:43 AM
Currently job initialization is done in JobInitalizationPoller in the poller when an Exception is thrown while doing JobInProgress.initTasks() it does a JobInProgress.fail() but the fail does not inform all the job in progress listeners, resulting in job not being removed from the waiting job list of scheduler.
Attaching a fix which addresses this issue. Alongwith test case which tests the condition.
The result of ant test-patch is :
[exec]
[exec]
[exec]
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 3 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
Few comments:
Attaching patch incorporating most of Vinod's comments.
This has not been incorporated because of issue described in HADOOP-5020 it is hit when JobInProgress.initTasks() throws an exception and terminate job is called and Capacity scheduler would never be able to remove the job from the waiting queue. Also added a new test case TestJobInitalizationPoller which uses MiniMRCluster to verify if jobs failing initialization are actually removed from waiting queue. Removing an unused import statement from JobInitializationPoller
Attaching patch incorporating Vinod's offline comments:
Output from ant test-patch:
[exec]
[exec]
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 6 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
[exec]
Attaching patch for branch 20.
Attaching patch incorporating Hemanth's offline comments.
In TestJobInitialization
Attaching both 20 branch and trunk patch Changes look fine to me. +1.
Output for ant test-patch for the latest attachment is as follows:
[exec]
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 6 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
[exec]
I just committed this to trunk and branch 0.20. Thanks, Sreekanth !
Integrated in Hadoop-trunk #828 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/828/
. Forgot to add a new file in the previous commit. . Remove jobs that failed initialization from the waiting queue in the capacity scheduler. Contributed by Sreekanth Ramakrishnan. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||