Issue Details (XML | Word | Printable)

Key: MAPREDUCE-381
Type: Sub-task Sub-task
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Devaraj Das
Reporter: Devaraj Das
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hadoop Map/Reduce
MAPREDUCE-378

Add framework hooks to get the running/completed/pending tasks for a given job. Add a way to query the list of currently active tasktrackers from the JobTracker.

Created: 09/Dec/08 07:21 AM   Updated: 20/Jun/09 07:51 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 4807.3.patch 2008-12-11 04:38 PM Devaraj Das 29 kB
Text File Licensed for inclusion in ASF works 4807.patch 2008-12-10 06:39 PM Devaraj Das 23 kB

Resolution Date: 12/Dec/08 05:56 AM


 Description  « Hide
Add framework hooks to get the IDs of running/completed/pending tasks for a given job. Add a way to query the list of currently active tasktrackers from the JobTracker. These are required to inject failures.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Devaraj Das added a comment - 10/Dec/08 06:39 PM
Attached patch provides the functionality. The patch adds a new class TaskInProgressStatus that is passed along in the status report (like when the method JobSubmissionProtocol.getMapTaskReports() is invoked) to the client. Also, the ClusterStatus now has the list of active trackers.
Provided the following new command line options in the JobClient:
1) -list-trackers : display the list of active trackers in the cluster
2) -list-attempt-ids <jobId> <task-type> <task-state> : displays the list of tasks for a given job of a give type (like map or reduce) currently in a particular state (running or completed).

Arun C Murthy added a comment - 11/Dec/08 05:30 AM
Minor comments:
  1. I'd suggest we deprecate both the current ClusterStatus constructors and use only the new one - this helps with the complication in write and readFields too.
  2. Rather than introduce {set|is}{Failed|Pending|Completed|Running|Killed} methods in TaskInProgressStatus I'd suggest a single {get|set}Status which takes/returns the enum which is called with appropriate values from TaskInProgress.generateSingleReport.
  3. There are some debugging statements left over.

Long term: It would be really great to use the TaskInProgressStatus.TaskInProgressStatusType to maintain the TIP's state rather than all the booleans...


Arun C Murthy added a comment - 11/Dec/08 05:31 AM
Super minor nit: I'd prefer to rename TaskInProgressStatusType to just Status! smile

Sharad Agarwal added a comment - 11/Dec/08 06:19 AM
Clients may be interested in names of blacklisted trackers as well. I think -list-trackers should list all by default and perhaps take an argument to list blacklisted/active ones ?

Devaraj Das added a comment - 11/Dec/08 04:38 PM
Thanks Arun/Sharad for looking at this. I have incorporated the comments.
1) Made TaskInProgressStatusType a separate class all by itself and called it TIPStatus
2) Factored TaskInProgressStatus functionality into TaskReport since that is where it is needed
3) Added a new getClusterStatus method that takes a boolean argument and depending on whether it is true, sets the task tracker names as well. If false, only counts of blacklisted/active trackers are set (as is the behavior today). I did this since I wanted to avoid iterations over the tasktracker list for every call to assignTasks in CapacityScheduler, for example.

Sharad Agarwal added a comment - 12/Dec/08 05:40 AM
looks good.
Please take a note to document the new commands to the commands manual.

Devaraj Das added a comment - 12/Dec/08 05:56 AM
All test cases and test-patch passed. I committed this.