|
Making queues explicit makes sense for the purposes of getSchedulingInfo then. As for what it should do when applied to a job, in the fair scheduler at least we can have it show the job's fair share of map slots / reduce slots and its weight in the fair sharing calculations. This was useful both for debugging and for letting administrators understand the effects of putting jobs in a particular pool, changing their priority, etc.
Regarding the comparator, I made it that because Owen/Sameer/Arun wanted to also be able to compare a subset of the jobs, for example to be able to filter jobs by user or something of that sort. With a comparator, you choose your subset as you wish and then sort it. (In all this I'm assuming that the JobTracker or JobQueueManager knows the full list of jobs and can therefore filter it.) However, it would also be possible to return the whole job list and filter it afterwards - which one is easier? Added getSchedulingInfo(queue).
We probably also need a way to get all the configured queues. Something like:
By default, this would return the single default queue that is there today in the jobtracker. Makes sense ?
I don't think a Comparator is the right abstraction here. There is a difference between filtering and reordering. A Comparator is probably needed for the latter, but not for filtering. The Scheduler imposes an ordering on the jobs. A caller may choose to see (filter) only some of those jobs, but the ordering is determined by the Scheduler. I think you need a method like: Collection<JobInProgress> getJobs(String queueName)
Users can filter this collection as they seem fit. Makes sense, I've changed it to that.
Attaching patch with adding API's in the TaskScheduler to expose scheduling information related to it.
Added following methods to TaskScheduler: Map<String,String> getSchedulingInfo(JobInProgress job) - Returns map containing scheduling information related to a particular job List<String> getQueueSchedulingParameterList() - Returns ordered List of the scheduling parameters related to queues. The above two methods were introduced, to determine the the order in which the columns in a table have to be generated by the web UI. A new method was introduced in JobTracker: JobQueueTaskScheduler and LimitTasksPerJobTaskScheduler have been modified to implement the new API's to expose scheduling information. Have made changes in the jobtracker.jsp to do the following: Create a new section called Scheduler information and build a table dynamically for displaying the scheduler information pertaining to queues which scheduler holds. The order of the column is determined by value returned from getQueueSchedulingParameterList(). If a particular scheduler returns null for getQueueSchedulingParameterList, then the new section called Scheduler information is not displayed in the jobtracker.jsp Any thoughts on improving the above approach Sreekanth selected Submit Patch by mistake instead of attaching the patch. Canceling it on his behalf, as he's not added to the list of contributors yet.
UI Mockup with default JobQueueTaskScheduler
The jobs-page is actually useful to users who have submitted jobs. The scheduler information is typically important for cluster administrators. Does it make sense to make the scheduler information show up as a separate page rather than the same page that lists all user's jobs? Of course, power users might want to see scheduling information sometime.
Regarding where the scheduler information should be - there are two types of scheduler information:
Also, as we are discussing in
I must clarify that I was talking only about the List<String> getQueues() API, and not the rest of the methods proposed here. Sorry for any confusion caused. As mentioned here Some comments about the patch ( independent of where the scheduler information is displayed)
After thinking about this for a bit, I think a more natural interface for getSchedulingInfo would be:
class JobInProgress {
...
Object getSchedulerInfo() { ... }
void setSchedulerInfo(Object info) {...}
The scheduler can then add its own information directly into the JobInProgress. Clearly each scheduler would have its own type for scheduler info. The framework would use the scheduler info's toString() method to generate the string for the user. Thoughts? The framework would also need to handle null values in the scheduler info. Also note that this will replace the map in all of the schedulers that looks like:
Map<JobInProgress, JobInfo> infos = ... That's a good idea, and should make the schedulers more efficient as well.
A couple of points if we are moving towards Owen's proposal:
Please try to vote on your preferred UI approach, if any, so we can move this forward. I had an offline conversation with Owen and we came to the following proposal:
One thing we haven't discussed this far is the changes to the framework to aid the CLI.
Showing scheduling information related to a job seems easy. We can augment JobStatus to contain a String schedulerInfo. For showing the queue related information, one approach could be as follows: public class QueueInfo implements Writable { String queueName; String schedulerInfo; ... } public interface JobSubmissionProtocol { ... QueueInfo[] getQueues(); QueueInfo getQueueInfo(String queue); JobStatus[] getJobs(String queue); } These APIs are similar to the Job related APIs, like getAllJobs(), getJobStatus(JobID), and getMap/ReduceTaskReports(JobID). Still, I am a little worried adding these to JobSubmissionProtocol since getQueues() and getQueueInfo() don't per-se relate to jobs directly. The alternative though seems to be to define a new protocol that has this info. Open to comments on which is better. QueueInfo should have private fields and getters.
JobSubmissionProtocol is not public and therefore the JobClient needs identical methods. Attaching a patch with following changes according to Owens and Hemanth's comments:
Added following method to JobSubmissionProtocol public JobQueueInfo getJobQueueInfo(String queue); public JobQueueInfo[] getJobQueueInfos(); public JobStatus[] getAllJobs(String queue); Added a new method to TaskScheduler public abstract Collection<JobInProgress> getJobs(String queueName); Added a new class to encapsulate the Scheduling information related to Job Queues :: JobQueueInfo Added new jsp page to display queue details and list of jobs held by the queue along with the Queue Scheduling Information: jobqueue_details.jsp Refactored Job Table generation into a new class in org.apache.hadoop.mapred.JSPUtil Added new command line options in the JobClient.java Currently the patch has no test case attached alongwith it. Would be attaching them soon. Attaching a new patch with following changes:
Made modification to CapacityTaskScheduler to use the new API for the web UI and command line interface Added test case which makes use of the new methods introduced in the JobSubmissionProtocol Updated the JobQueueInfo class because of a NullPointerException being thrown for the default scheduler, pointed out by Hemanth.
Fixed a findbugs warnings in LimitTasksPerJobTaskScheduler regarding incorrect synchronization. JobTracker:
JobQueueInfo:
JspUtil:
JobSubmissionProtocol:
JobClient:
jobqueue_details.jsp:
jobtracker.jsp:
CapacityTaskScheduler:
TestJobQueueInformation:
Made modifications according to comments. I have also mentioned reasons why somethings are left as it is from previous version of the patch.
Code repetition for converting collection JobInProgress to an array of JobStatus has been removed. Modified getAllJobs and getAllJobs(Queue). Left jobsToComplete as is.
The reason why we are using an object and passing only String over wire is because we are setting scheduling information only once. Then underlying reference of the scheduling information is updated by the respective TaskScheduler's and we do a toString() while passing over wire. This way we can avoid to constantly update the scheduling information in queue manager. For example check CapacityTaskScheduler.
It is using JSPHelper from the package to generate the percentage graph. Maybe that method should be moved into ServletUtil class in core util package.
If a queue supports priority or not is stored by the JobQueueManager in capacity scheduler. The queue scheduling information object does not contain if a particular queue can support priority or not. So that is why there is a seperate field.
Reason why I am not using JobClient directly is because: by calling them we are going to call up display methods, if we call up display methods then we would have to parse the output of the job client and then do the test for equality. Moreover all the display method newly defined are private. If it is really required I can make them public then change test to parse the display string and test equality.
I was thinking of something like: private JobStatus[] getJobStatus(Collection<JobInProgress> jips, boolean onlyRunning) { // .. if (onlyRunning) { // consider only jobs which are running or prep. } } Would that work ? Regarding tests, taking cue from APIs like getAllJobs, I think it is OK to provide wrapper APIs around the queue info related methods. These could be package private and the test case can directly access these. So, something like: JobQueueInfo[] getJobQueueInfos() { return jobSubmitClient.getJobQueueInfos(); }
private void displayQueueList() {
JobQueueInfo[] queues = getJobQueueInfos();
//
}
Agree with rest of your explanations. Actually going over some of the comments above, I see this comment from Owen:
So, this agrees with what I've proposed above. In fact, we should make the APIs public and not package private. Looked at other changes. They seem fine to me.
Except in jobqueue_details.jsp, there's a line coming at the end as follows: Hadoop, 2008. \ No newline at end of file > we should make the APIs public and not package private.
That's the subject of HADOOP-3822. Let's not introduce new public APIs in this issue. Refactored getJobsToComplete, getAllJobs and _getAllJobs(queue) in JobTracker.
Introduced two new package level methods for getting JobQueueInfo from JobClient and used the same in test case. Fixed No newline at end of file message I think we need to have a separate class for the queue command. Otherwise, users can cross the commands like:
which would be confusing. With separate classes, we can only support the sub-commands that make sense. Attaching patch with following modification from the previous patch:
Incorporating comments from Hemanth:
In addition to the above changes, following also has been modified according to Hemanth's comments:
One nit. Previously, the JobTracker was setting the start time in the JobStatus for all the jobs. This is missing from the refactored code, and hence client is showing start time as 0.
Other than that, it looks good to me. Fixing the start time issue, which was missed out while refactoring the code.
This is getting close, but I have a few suggestions.
When I asked you to split the queue queries out of JobClient, I didn't think about the API. I think the API is better in JobClient and JobQueueClient is only about the main that supports the cli commands. JobQueueClient shouldn't be a public class, because otherwise it ends up in the public API. So the API access is still through JobClient and JobQueueClient implements Tool, etc. Let's make JobSubmissionProtocol.getJobQueueInfos to getQueues(). getJobQueueInfo should be getQueueInfo(queueName). The methods in JobQueueClient should be public, moved to JobClient, and renamed:
mapred.JSPUtil should not be public. Several of the new public API classes and methods are missing javadoc. JobQueueInfo.schedulerInfo should be a string, rather than an object. Since the serialization forces it to be a string, it should just be typed/stored that way. The QueueManager should probably have a map like: Map<String, Object> schedulerInfo; // map from queue name to scheduler specific object and just create the JobQueueInfo when the JobSubmissionProtocol methods are called. The constructor should take the two strings and don't bother with the setSchedulerInfo. I'm not very happy with ClientUtil. It seems like a weak abstraction. Is it really necessary, especially if you fold back into JobClient? Attaching patch with following changes:
Adding Apache license to TestJobQueueInformation and JSPUtil according to Hemanth's comment.
I am pasting inline the output of ant test-patch on my local machine.
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 3 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
ant test-core and ant test-patch did not generate any build failure, on today's trunk on my local machine I just committed this. Thanks, Sreekanth.
Seems that there is a NPE in LimitTasksPerJobTaskScheduler. See
Integrated in Hadoop-trunk #611 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/611/
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
getSchedulingInfo() should really return key-value pairs for queues, not for jobs. In the
HADOOP-3445scheduler, for example, we need to display scheduling information associated with a queue - its capacity (both 'guaranteed' and 'allocated'), how many unique users have submitted jobs, how many tasks are running, how many are waiting. etc. This information is per queue, and doesn't make sense per job. I'd much rather have getSchedulingInfo() take in a queue name as a parameter, if we make queues explicit. In fact, I don't see what kind of scheduling information you'd associate with a job. Matei, do you have examples of what getSchedulingInfo would return for jobs?Similarly, getJobComparator() makes more sense when applied to a queue. In 3445, jobs are ordered per queue, and there is no global ordering. Furthermore, doesn't it make more sense to get a sorted collection of jobs, per queue, back from the scheduler, rather than a Comparator? Or are you imagining the UI and CLI to maintain a list of jobs all the time and then apply the comparator periodically?