Issue Details (XML | Word | Printable)

Key: HADOOP-4559
Type: New Feature New Feature
Status: Open Open
Priority: Trivial Trivial
Assignee: Unassigned
Reporter: Florian Leibert
Votes: 3
Watchers: 16
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Rest API for retrieving job / task statistics

Created: 31/Oct/08 03:55 PM   Updated: 30/Jun/09 12:03 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Original Estimate: 2h
Original Estimate - 2h
Remaining Estimate: 2h
Remaining Estimate - 2h
Time Spent: Not Specified
Remaining Estimate - 2h

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-4559v2.patch 2008-12-22 03:54 PM Florian Leibert 17 kB
Issue Links:
Reference
 

Release Note: adds api features to the webapp part of hadoop allowing to retrieve task stats for a given job


 Description  « Hide
a rest api that returns a simple JSON containing information about a given job such as: min/max/avg times per task, failed tasks, etc. This would be useful in order to allow external restart or modification of parameters of a run.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Florian Leibert added a comment - 31/Oct/08 04:14 PM
This will provide a very simple api that allows to retrieve statistics about the tasks for a given jobid - such as average, min and max times per task, failed tasks per job, total job runtime, etc.

Steve Loughran added a comment - 03/Nov/08 11:14 AM
  • although its a JSP page, everything, including printing, is done in Java code. It would either be better implemented as a pure servlet, or the output redone as <%= %> operations to produce something more JSP-y
  • I recommend HtmlUnit as the best extension to JUnit for testing web pages; it could grab the pages and look at the content.

Hadoop QA added a comment - 03/Nov/08 05:00 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393159/HADOOP-4559.patch
against trunk revision 709609.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3515/console

This message is automatically generated.


Paco Nathan added a comment - 03/Nov/08 07:44 PM - edited
HADOOP-4559 provides a workaround for part of the issue described in HADOOP-3850. Can now access log data by making REST calls to JSP provided in 3850. For example:

RunningJob currentjob = JobClient.runJob(job_conf);

JobID id = currentjob.getID();
String url = "http://localhost:50030/api.jsp?info=jobdetails&id=" + id.getId();

HttpClient client = new HttpClient();
HttpMethod method = new GetMethod(url);

client.executeMethod(method);
String logData = method.getResponseBodyAsString();
method.releaseConnection();


Chris Douglas added a comment - 08/Nov/08 12:49 AM

although its a JSP page, everything, including printing, is done in Java code. It would either be better implemented as a pure servlet

+1

  • Please format the code according to the conventions.
  • There's a fair amount of dead code in this patch, e.g.
    +	    StringBuffer sb = new StringBuffer();
    +	    boolean isFirst = true;
    +	    for (String kv : kv_pairs) {
    +	    	
    +			sb.append(kv);	
    +		}
    

    kv_pairs is initialized, but empty. sb is unused, save in this loop. The loop above it doesn't appear to do any productive work. StringBuilder should be used instead of StringBuffer in this context.

  • If you're proposing this as a public API, it must at least have a unit test.
  • Isn't most of this provided through job history?

Paco Nathan added a comment - 26/Nov/08 01:51 AM
> Isn't most of this provided through job history?

No, not really. Not if a long-running workflow requires these measurements for automated decisions.

While a human can read the job history data from JSP pages, there's no current means for the app code which calls ToolRunner to obtain that data and use it to alter the workflow.


Bill de hOra added a comment - 28/Nov/08 01:28 AM
JobID id = currentjob.getID();
String url = "http://localhost:50030/api.jsp?info=jobdetails&id=" + id.getId();

Can't you just call this a JSP into the jobtracker instead? I hate to nitpick, but it's not REST style (client url construction), nor is the response (no links), and ASF code should (imvho) know the difference. If you want to be build REST style tooling around the tracker, I'd be happy to help with that. For example to scale this up to a lot of jobs and/or a lot of clients will require something that doesn't hammer the tracker. And iterating over the tracker seems like a linear bottleneck - O(1) key lookup would be much better.


Florian Leibert added a comment - 22/Dec/08 03:54 PM
the previous version was a bit dirty. I think this one is quite an improvement. We're using it to gather a lot of stats for our job runs. It's not a servlet and doesn' contain HtmlUnit - I think one stats JSP doesn't justify adding another library to the distribution - also for the sake of simplicity this remains a JSP... Hope this is valuable for someone else as well - it really is useful for us to track performance when modifying our algorithm...

Steve Loughran added a comment - 04/Feb/09 11:19 PM
+1 to Bill's idea for a RESTy API, one that works long-haul.