Issue Details (XML | Word | Printable)

Key: HADOOP-544
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Enis Soztutar
Reporter: Owen O'Malley
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Replace the job, tip and task ids with objects.

Created: 19/Sep/06 05:47 AM   Updated: 08/Jul/09 04:51 PM
Return to search
Component/s: None
Affects Version/s: 0.18.0
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works id_v1.patch 2007-12-07 04:45 PM Enis Soztutar 193 kB
Text File Licensed for inclusion in ASF works id_v2.patch 2007-12-18 03:52 PM Enis Soztutar 210 kB
Text File Licensed for inclusion in ASF works id_v3.patch 2007-12-19 07:23 AM Enis Soztutar 213 kB
Text File Licensed for inclusion in ASF works id_v4.patch 2008-01-04 02:02 PM Enis Soztutar 259 kB
Text File Licensed for inclusion in ASF works id_v5.patch 2008-04-22 11:01 AM Enis Soztutar 278 kB
Text File Licensed for inclusion in ASF works id_v6.patch 2008-04-29 01:38 PM Enis Soztutar 279 kB
Text File Licensed for inclusion in ASF works id_v7.patch 2008-04-29 11:36 PM Owen O'Malley 278 kB
Text File Licensed for inclusion in ASF works id_v8.patch 2008-04-30 08:50 AM Enis Soztutar 287 kB
Text File Licensed for inclusion in ASF works id_wip1.patch 2007-11-16 10:06 AM Enis Soztutar 8 kB

Hadoop Flags: Reviewed, Incompatible change
Release Note:
Introduced new classes JobID, TaskID and TaskAttemptID, which should be used instead of their string counterparts. Deprecated functions in JobClient, TaskReport, RunningJob, jobcontrol.Job and TaskCompletionEvent that use string arguments. Applications can use xxxID.toString() and xxxID.forName() methods to convert/restore objects to/from strings.
Resolution Date: 30/Apr/08 12:23 PM


 Description  « Hide
I think that it is silly to have tools parsing the strings that the framework builds for task ids. I propose:

class JobId implements Writable {
public int getJobId() {...}
}

class TaskId implements Writable {
public JobId getJobId();
public boolean isMap() { ... }
public int getTaskId() { ... }
}

class TaskAttemptId implements Writable {
public TaskId getTaskId();
public int getAttemptId();
}

each of the classes will have a toString() method that generates the current string.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Enis Soztutar added a comment - 16/Nov/07 10:06 AM
Bringing up this old issue, I am really uncomfortable with the ids handled as strings. The attached patch is a proposal for the new XXXId classes that we may use. I intend to continue the progress once we have grounded the need and the design for this issue. Any thoughts ?

Enis Soztutar added a comment - 07/Dec/07 04:45 PM
This 5k+ patch introduces three new classes TaskId, JobId and TaskInProgressId, and changes every occurrence of th e string counterparts to use objects.

Enis Soztutar added a comment - 07/Dec/07 04:47 PM
local tests pass with some unstability for contrib. Lets see what the hudson say.

Hadoop QA added a comment - 07/Dec/07 07:10 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371240/id_v1.patch
against trunk revision r602002.

@author +1. The patch does not contain any @author tags.

javadoc -1. The javadoc tool appears to have generated messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs -1. The patch appears to introduce 5 new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/console

This message is automatically generated.


Doug Cutting added a comment - 07/Dec/07 07:44 PM
Some comments:
  • this incompatibly changes public APIs. we should instead create new methods and deprecate the old ones.
  • Id, JobId, etc. are preferred to ID, JobID, etc.: acronyms are best capitalized as words
  • JobId, etc. should probably be separate classes, not nested classes
  • the new classes need javadoc

Enis Soztutar added a comment - 18/Dec/07 03:52 PM
This patch addresses Doug's comments.
Changes include :
  • Id classes are now top level,
  • acronyms are capitalized as JobID, etc,
  • detailed javadoc is provided
  • Public methods of JobClient, jobcontrol.Job, TaskReport and TaskCompletionEvent are deprecated and new methods are introduced. Since we cannot override methods by changing only the return values, some of the get{Job|TIP|Task}ID() functions had to be renamed. This may not be nice, since now there can be two functions
    @Deprecated
    String TaskReport.getTaskId() { ... }
    public TaskInProgressID getTaskID() { ... }
    
    String RunningJob.getJobID() ;
    JobID RunningJob.getID();

    but I can not think of another way to keep backwards compatibility. We may get rid of the deprecated functions in 0.17. Until then we should live with them. Changes to JobSubmissionProtocol is done directly instead of deprecating. (see HADOOP-1643)

  • an issue with TaskLogAppender from the previous patch is fixed.

Hadoop QA added a comment - 18/Dec/07 04:57 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371871/id_v2.patch
against trunk revision r605220.

@author +1. The patch does not contain any @author tags.

javadoc -1. The javadoc tool appears to have generated messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/console

This message is automatically generated.


Enis Soztutar added a comment - 19/Dec/07 07:24 AM
Patch with fixed javadoc warnings and findbugs warnings.

Hadoop QA added a comment - 19/Dec/07 08:54 AM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371930/id_v3.patch
against trunk revision r605457.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/console

This message is automatically generated.


Enis Soztutar added a comment - 04/Jan/08 02:02 PM
Patch updated to meet current trunk. The functions are consistently named as
get{Job|Task|TaskInProgress}ID() with some minor exceptions, where the change would cause more trouble than the benefits.

Hadoop QA added a comment - 06/Jan/08 05:45 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12372507/id_v4.patch
against trunk revision .

@author +1. The patch does not contain any @author tags.

javadoc -1. The javadoc tool appears to have generated messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs -1. The patch appears to cause Findbugs to fail.

core tests -1. The patch failed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1483/testReport/
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1483/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1483/console

This message is automatically generated.


Enis Soztutar added a comment - 06/Jan/08 06:51 PM
Retrying hudson tests, which failed due to HADOOP-2240.

Hadoop QA added a comment - 06/Jan/08 11:59 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12372507/id_v4.patch
against trunk revision .

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/console

This message is automatically generated.


Enis Soztutar added a comment - 07/Jan/08 10:03 AM
Feature freeze is soon, and we do not want this huge patch to cause trouble. Moving this to be fixed at 0.17.

Enis Soztutar added a comment - 02/Apr/08 02:24 PM
I could not find time to update the patch, moving this to 0.18.

Devaraj Das added a comment - 17/Apr/08 10:10 AM
Enis, could you please update the patch. This seems like a good time to get it in. Thanks!

Enis Soztutar added a comment - 17/Apr/08 11:21 AM
I think so! I'll update the patch as soon as I can.

Enis Soztutar added a comment - 22/Apr/08 11:01 AM
Patch updated to trunk, No major changes from the previous one.

Hadoop QA added a comment - 23/Apr/08 05:38 AM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380681/id_v5.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 12 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/console

This message is automatically generated.


Enis Soztutar added a comment - 23/Apr/08 12:14 PM
Committers, I kindly request that the patch is reviewed ASAP, so that we can include this with minumum overhead. Thanks, smile.

Owen O'Malley added a comment - 23/Apr/08 10:28 PM
I'd propose that we change the names of the id classes toward a less confusing system:

JobId
TaskId
TaskAttemptId

I think it is better not to import each of the values of enumerations, so that you always have the type name associated with the value.

Thanks for all of the hard work on this Enis!


Owen O'Malley added a comment - 23/Apr/08 10:31 PM
It also changes the signature on some public methods without deprecation. Is there a way to detect those?

Owen O'Malley added a comment - 23/Apr/08 11:58 PM
here is a tar ball generated by jdiff that lists the api changes by this patch. unfortunately, it really needs to be fixed or we will break a bunch of applications.

Enis Soztutar added a comment - 24/Apr/08 08:40 AM
Owen, thanks for the jdiff, I haven't though of that.
I checked the differences in the public APIs :
  • methods in CompletedJobStatusStore is changed w/o deprecation, but I think this class should be package private rather than public. Its methods will be accessed though JobClient leveraging facade design pattern, no?
  • API changes in JobClient, RunningJob, TaskCompletionEvent, TaskReport deprecates old ones, no problem here.
  • API changes in JobHistory : I am not sure if JobHistory is really public. There are HistoryViewer, jsp pages etc to view/parse job history so I do not know if it is actually used outside hadoop. If you think it is, then we can fix the patch to keep the old methods as well.
  • Although public, JobProfile, JobStatus, jobSubmissionProtocol(see HADOOP-1643), MapTaskStatus, and ReduceTaskStatus are never directly exposed to the user so the changes in them are acceptable I think.
  • methods in JT and TT, not sure if we need to change them, if so we can fix this.

The name sets {jobid, taskinprogressid, taskid} and {jobid, taskid, taskattemptid} is already confusing (sometimes even I get confused. ) I think the ultimate naming should be the latter, which makes more sense. However we have been using and exposing both naming schema. For example the task names start with tip_ or task_, we have JobClient#killTask(taskid) which accepts a task attempt id, and such, on the other hand JobHistory uses taskID-taskAttemptID, so the current situation is already inconsistent.

I suspect if we introduce JobID, TaskID, TaskAttemptID as a replacement of the deprecated JobClient methods, then it will cause some trouble. I propose we stick with the naming schema in this patch, and later in a bigger issue change all the naming (internal and external) to respect {jobid, taskid, taskattemptid} (as a side note refactoring TaskID to TaskAttemptID and TaskInProgressID to TaskID will be mind-blowing) . This new issue should be a blocker for 0.18 so that users not using trunk will not see this (current) patch. Thoughts ?


Arun C Murthy added a comment - 24/Apr/08 09:02 AM
Enis, I agree with Owen - I think it would be great if this patch can take the first step and turn the tide towards the right nomenclature. Thanks again for your patience...

Arun C Murthy added a comment - 24/Apr/08 09:08 AM
+1 for Enis's take on changing public classes which aren't really part of the Map-Reduce's User Interface... we can change {Map|Reduce}TaskStatus etc. without any harm/fear.

Mukund Madhugiri added a comment - 24/Apr/08 04:45 PM
I ran Sort benchmark on 500 nodes with the patch ( id_v5.patch) and see these results:

0.17.0 build: 85.3 mins
patched trunk build: 80.3 mins


Owen O'Malley added a comment - 24/Apr/08 04:55 PM
Which matches what I've seen in the profiler on the JobTracker, which is that we spend a lot of time looking up maps from string to X, for lots of different X's.

Enis Soztutar added a comment - 29/Apr/08 01:38 PM
A 5% improvement, wow !

Thanks to the help from Arun, I've updated the patch. This version is a first attempt to build the naming convention.
Now we have JobID, TaskID, TaskAttemptID classes. The prefixes for these are also changed to "job_", "task_" and "attempt", respectively. Surely any code which parsed these names from the strings will break, but thats exactly why we hereby introduce xxxID classes.

As a side note, we keep tipId and taskId variable names, in some places. It will be good if they are also refactored, but I prefer to delay this in a later issue.

Thanks for the benchmark, and the reviews.


Hadoop QA added a comment - 29/Apr/08 11:08 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381106/id_v6.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 12 new or modified tests.

patch -1. The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2343/console

This message is automatically generated.


Owen O'Malley added a comment - 29/Apr/08 11:36 PM
I removed the spurious changes to the TaskRunner import statements that were conflicting.

Hadoop QA added a comment - 30/Apr/08 02:11 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381151/id_v7.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 12 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/console

This message is automatically generated.


Devaraj Das added a comment - 30/Apr/08 06:08 AM
Resubmitting the patch. The test failures had something to do with MapReduce

Hadoop QA added a comment - 30/Apr/08 07:26 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381151/id_v7.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 12 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/console

This message is automatically generated.


Enis Soztutar added a comment - 30/Apr/08 08:50 AM
Updated version, introduces static xxxID.getxxxIDsPattern(...) methods to obtain regex patterns which match ID strings. Patch fixes the failed tests, which hand-crafted these patterns.

Hadoop QA added a comment - 30/Apr/08 12:06 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12381173/id_v8.patch
against trunk revision 645773.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 15 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/console

This message is automatically generated.


Devaraj Das added a comment - 30/Apr/08 12:23 PM
I just commited this. Thanks, Enis!

Enis Soztutar added a comment - 30/Apr/08 12:36 PM
Changed the names of the classes in the release note, I will also fix and recommit CHANGES.txt.

Hudson added a comment - 01/May/08 11:02 AM

Nigel Daley added a comment - 02/May/08 03:40 PM
For all the discussion about deprectatoin, etc, the patch for this issue has broken trunk! It no longer compiles on Hudson. The contrib eclipse plugin relied on an API that you considered non-Public (JobStatus and perhaps others). Given we have no well defined public API yet, I don't think you can assume that classes declared "public" aren't actually public. At a minimum, the release notes should indicated that methods were removed and not deprecated.

Build failure:
http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/476/console

compile:
[echo] contrib: eclipse-plugin
[javac] Compiling 45 source files to /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-trunk/workspace/trunk/build/contrib/eclipse-plugin/classes
[javac] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-trunk/workspace/trunk/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/server/HadoopServer.java:119: cannot find symbol
[javac] symbol : method getJobId()
[javac] location: class org.apache.hadoop.mapred.JobStatus
[javac] String jobId = status.getJobId();
[javac] ^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 1 error

Can we get this fixed today?


Nigel Daley added a comment - 02/May/08 03:44 PM
I opened HADOOP-3338 to address the compilation failure on trunk

Hudson added a comment - 03/May/08 09:17 PM

Hudson added a comment - 22/Aug/08 12:34 PM