Hadoop Common
  1. Hadoop Common
  2. HADOOP-544

Replace the job, tip and task ids with objects.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.18.0
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Introduced new classes JobID, TaskID and TaskAttemptID, which should be used instead of their string counterparts. Deprecated functions in JobClient, TaskReport, RunningJob, jobcontrol.Job and TaskCompletionEvent that use string arguments. Applications can use xxxID.toString() and xxxID.forName() methods to convert/restore objects to/from strings.
      Show
      Introduced new classes JobID, TaskID and TaskAttemptID, which should be used instead of their string counterparts. Deprecated functions in JobClient, TaskReport, RunningJob, jobcontrol.Job and TaskCompletionEvent that use string arguments. Applications can use xxxID.toString() and xxxID.forName() methods to convert/restore objects to/from strings.

      Description

      I think that it is silly to have tools parsing the strings that the framework builds for task ids. I propose:

      class JobId implements Writable {
      public int getJobId()

      {...}

      }

      class TaskId implements Writable {
      public JobId getJobId();
      public boolean isMap()

      { ... }
      public int getTaskId() { ... }

      }

      class TaskAttemptId implements Writable {
      public TaskId getTaskId();
      public int getAttemptId();
      }

      each of the classes will have a toString() method that generates the current string.

      1. id_v8.patch
        287 kB
        Enis Soztutar
      2. id_v7.patch
        278 kB
        Owen O'Malley
      3. id_v6.patch
        279 kB
        Enis Soztutar
      4. id_v5.patch
        278 kB
        Enis Soztutar
      5. id_v4.patch
        259 kB
        Enis Soztutar
      6. id_v3.patch
        213 kB
        Enis Soztutar
      7. id_v2.patch
        210 kB
        Enis Soztutar
      8. id_v1.patch
        193 kB
        Enis Soztutar
      9. id_wip1.patch
        8 kB
        Enis Soztutar

        Activity

        Hide
        Enis Soztutar added a comment -

        Bringing up this old issue, I am really uncomfortable with the ids handled as strings. The attached patch is a proposal for the new XXXId classes that we may use. I intend to continue the progress once we have grounded the need and the design for this issue. Any thoughts ?

        Show
        Enis Soztutar added a comment - Bringing up this old issue, I am really uncomfortable with the ids handled as strings. The attached patch is a proposal for the new XXXId classes that we may use. I intend to continue the progress once we have grounded the need and the design for this issue. Any thoughts ?
        Hide
        Enis Soztutar added a comment -

        This 5k+ patch introduces three new classes TaskId, JobId and TaskInProgressId, and changes every occurrence of th e string counterparts to use objects.

        Show
        Enis Soztutar added a comment - This 5k+ patch introduces three new classes TaskId, JobId and TaskInProgressId, and changes every occurrence of th e string counterparts to use objects.
        Hide
        Enis Soztutar added a comment -

        local tests pass with some unstability for contrib. Lets see what the hudson say.

        Show
        Enis Soztutar added a comment - local tests pass with some unstability for contrib. Lets see what the hudson say.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12371240/id_v1.patch
        against trunk revision r602002.

        @author +1. The patch does not contain any @author tags.

        javadoc -1. The javadoc tool appears to have generated messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs -1. The patch appears to introduce 5 new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests -1. The patch failed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371240/id_v1.patch against trunk revision r602002. @author +1. The patch does not contain any @author tags. javadoc -1. The javadoc tool appears to have generated messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs -1. The patch appears to introduce 5 new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1296/console This message is automatically generated.
        Hide
        Doug Cutting added a comment -

        Some comments:

        • this incompatibly changes public APIs. we should instead create new methods and deprecate the old ones.
        • Id, JobId, etc. are preferred to ID, JobID, etc.: acronyms are best capitalized as words
        • JobId, etc. should probably be separate classes, not nested classes
        • the new classes need javadoc
        Show
        Doug Cutting added a comment - Some comments: this incompatibly changes public APIs. we should instead create new methods and deprecate the old ones. Id, JobId, etc. are preferred to ID, JobID, etc.: acronyms are best capitalized as words JobId, etc. should probably be separate classes, not nested classes the new classes need javadoc
        Hide
        Enis Soztutar added a comment -

        This patch addresses Doug's comments.
        Changes include :

        • Id classes are now top level,
        • acronyms are capitalized as JobID, etc,
        • detailed javadoc is provided
        • Public methods of JobClient, jobcontrol.Job, TaskReport and TaskCompletionEvent are deprecated and new methods are introduced. Since we cannot override methods by changing only the return values, some of the get {Job|TIP|Task}

          ID() functions had to be renamed. This may not be nice, since now there can be two functions

          @Deprecated
          String TaskReport.getTaskId() { ... }
          public TaskInProgressID getTaskID() { ... }
          
          String RunningJob.getJobID() ;
          JobID RunningJob.getID();
          

          but I can not think of another way to keep backwards compatibility. We may get rid of the deprecated functions in 0.17. Until then we should live with them. Changes to JobSubmissionProtocol is done directly instead of deprecating. (see HADOOP-1643)

        • an issue with TaskLogAppender from the previous patch is fixed.
        Show
        Enis Soztutar added a comment - This patch addresses Doug's comments. Changes include : Id classes are now top level, acronyms are capitalized as JobID, etc, detailed javadoc is provided Public methods of JobClient, jobcontrol.Job, TaskReport and TaskCompletionEvent are deprecated and new methods are introduced. Since we cannot override methods by changing only the return values, some of the get {Job|TIP|Task} ID() functions had to be renamed. This may not be nice, since now there can be two functions @Deprecated String TaskReport.getTaskId() { ... } public TaskInProgressID getTaskID() { ... } String RunningJob.getJobID() ; JobID RunningJob.getID(); but I can not think of another way to keep backwards compatibility. We may get rid of the deprecated functions in 0.17. Until then we should live with them. Changes to JobSubmissionProtocol is done directly instead of deprecating. (see HADOOP-1643 ) an issue with TaskLogAppender from the previous patch is fixed.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12371871/id_v2.patch
        against trunk revision r605220.

        @author +1. The patch does not contain any @author tags.

        javadoc -1. The javadoc tool appears to have generated messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371871/id_v2.patch against trunk revision r605220. @author +1. The patch does not contain any @author tags. javadoc -1. The javadoc tool appears to have generated messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1382/console This message is automatically generated.
        Hide
        Enis Soztutar added a comment -

        Patch with fixed javadoc warnings and findbugs warnings.

        Show
        Enis Soztutar added a comment - Patch with fixed javadoc warnings and findbugs warnings.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12371930/id_v3.patch
        against trunk revision r605457.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371930/id_v3.patch against trunk revision r605457. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1392/console This message is automatically generated.
        Hide
        Enis Soztutar added a comment -

        Patch updated to meet current trunk. The functions are consistently named as
        get

        {Job|Task|TaskInProgress}

        ID() with some minor exceptions, where the change would cause more trouble than the benefits.

        Show
        Enis Soztutar added a comment - Patch updated to meet current trunk. The functions are consistently named as get {Job|Task|TaskInProgress} ID() with some minor exceptions, where the change would cause more trouble than the benefits.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12372507/id_v4.patch
        against trunk revision .

        @author +1. The patch does not contain any @author tags.

        javadoc -1. The javadoc tool appears to have generated messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs -1. The patch appears to cause Findbugs to fail.

        core tests -1. The patch failed core unit tests.

        contrib tests -1. The patch failed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1483/testReport/
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1483/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1483/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372507/id_v4.patch against trunk revision . @author +1. The patch does not contain any @author tags. javadoc -1. The javadoc tool appears to have generated messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs -1. The patch appears to cause Findbugs to fail. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1483/testReport/ Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1483/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1483/console This message is automatically generated.
        Hide
        Enis Soztutar added a comment -

        Retrying hudson tests, which failed due to HADOOP-2240.

        Show
        Enis Soztutar added a comment - Retrying hudson tests, which failed due to HADOOP-2240 .
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12372507/id_v4.patch
        against trunk revision .

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12372507/id_v4.patch against trunk revision . @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1491/console This message is automatically generated.
        Hide
        Enis Soztutar added a comment -

        Feature freeze is soon, and we do not want this huge patch to cause trouble. Moving this to be fixed at 0.17.

        Show
        Enis Soztutar added a comment - Feature freeze is soon, and we do not want this huge patch to cause trouble. Moving this to be fixed at 0.17.
        Hide
        Enis Soztutar added a comment -

        I could not find time to update the patch, moving this to 0.18.

        Show
        Enis Soztutar added a comment - I could not find time to update the patch, moving this to 0.18.
        Hide
        Devaraj Das added a comment -

        Enis, could you please update the patch. This seems like a good time to get it in. Thanks!

        Show
        Devaraj Das added a comment - Enis, could you please update the patch. This seems like a good time to get it in. Thanks!
        Hide
        Enis Soztutar added a comment -

        I think so! I'll update the patch as soon as I can.

        Show
        Enis Soztutar added a comment - I think so! I'll update the patch as soon as I can.
        Hide
        Enis Soztutar added a comment -

        Patch updated to trunk, No major changes from the previous one.

        Show
        Enis Soztutar added a comment - Patch updated to trunk, No major changes from the previous one.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12380681/id_v5.patch
        against trunk revision 645773.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 12 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12380681/id_v5.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 12 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2301/console This message is automatically generated.
        Hide
        Enis Soztutar added a comment -

        Committers, I kindly request that the patch is reviewed ASAP, so that we can include this with minumum overhead. Thanks, smile.

        Show
        Enis Soztutar added a comment - Committers, I kindly request that the patch is reviewed ASAP, so that we can include this with minumum overhead. Thanks, smile .
        Hide
        Owen O'Malley added a comment -

        I'd propose that we change the names of the id classes toward a less confusing system:

        JobId
        TaskId
        TaskAttemptId

        I think it is better not to import each of the values of enumerations, so that you always have the type name associated with the value.

        Thanks for all of the hard work on this Enis!

        Show
        Owen O'Malley added a comment - I'd propose that we change the names of the id classes toward a less confusing system: JobId TaskId TaskAttemptId I think it is better not to import each of the values of enumerations, so that you always have the type name associated with the value. Thanks for all of the hard work on this Enis!
        Hide
        Owen O'Malley added a comment -

        It also changes the signature on some public methods without deprecation. Is there a way to detect those?

        Show
        Owen O'Malley added a comment - It also changes the signature on some public methods without deprecation. Is there a way to detect those?
        Hide
        Owen O'Malley added a comment -

        here is a tar ball generated by jdiff that lists the api changes by this patch. unfortunately, it really needs to be fixed or we will break a bunch of applications.

        Show
        Owen O'Malley added a comment - here is a tar ball generated by jdiff that lists the api changes by this patch. unfortunately, it really needs to be fixed or we will break a bunch of applications.
        Hide
        Enis Soztutar added a comment -

        Owen, thanks for the jdiff, I haven't though of that.
        I checked the differences in the public APIs :

        • methods in CompletedJobStatusStore is changed w/o deprecation, but I think this class should be package private rather than public. Its methods will be accessed though JobClient leveraging facade design pattern, no?
        • API changes in JobClient, RunningJob, TaskCompletionEvent, TaskReport deprecates old ones, no problem here.
        • API changes in JobHistory : I am not sure if JobHistory is really public. There are HistoryViewer, jsp pages etc to view/parse job history so I do not know if it is actually used outside hadoop. If you think it is, then we can fix the patch to keep the old methods as well.
        • Although public, JobProfile, JobStatus, jobSubmissionProtocol(see HADOOP-1643), MapTaskStatus, and ReduceTaskStatus are never directly exposed to the user so the changes in them are acceptable I think.
        • methods in JT and TT, not sure if we need to change them, if so we can fix this.

        The name sets

        {jobid, taskinprogressid, taskid}

        and

        {jobid, taskid, taskattemptid} is already confusing (sometimes even I get confused. ) I think the ultimate naming should be the latter, which makes more sense. However we have been using and exposing both naming schema. For example the task names start with tip_ or task_, we have JobClient#killTask(taskid) which accepts a task attempt id, and such, on the other hand JobHistory uses taskID-taskAttemptID, so the current situation is already inconsistent.

        I suspect if we introduce JobID, TaskID, TaskAttemptID as a replacement of the deprecated JobClient methods, then it will cause some trouble. I propose we stick with the naming schema in this patch, and later in a bigger issue change all the naming (internal and external) to respect {jobid, taskid, taskattemptid}

        (as a side note refactoring TaskID to TaskAttemptID and TaskInProgressID to TaskID will be mind-blowing) . This new issue should be a blocker for 0.18 so that users not using trunk will not see this (current) patch. Thoughts ?

        Show
        Enis Soztutar added a comment - Owen, thanks for the jdiff, I haven't though of that. I checked the differences in the public APIs : methods in CompletedJobStatusStore is changed w/o deprecation, but I think this class should be package private rather than public. Its methods will be accessed though JobClient leveraging facade design pattern, no? API changes in JobClient, RunningJob, TaskCompletionEvent, TaskReport deprecates old ones, no problem here. API changes in JobHistory : I am not sure if JobHistory is really public. There are HistoryViewer, jsp pages etc to view/parse job history so I do not know if it is actually used outside hadoop. If you think it is, then we can fix the patch to keep the old methods as well. Although public, JobProfile, JobStatus, jobSubmissionProtocol(see HADOOP-1643 ), MapTaskStatus, and ReduceTaskStatus are never directly exposed to the user so the changes in them are acceptable I think. methods in JT and TT, not sure if we need to change them, if so we can fix this. The name sets {jobid, taskinprogressid, taskid} and {jobid, taskid, taskattemptid} is already confusing (sometimes even I get confused. ) I think the ultimate naming should be the latter, which makes more sense. However we have been using and exposing both naming schema. For example the task names start with tip_ or task_, we have JobClient#killTask(taskid) which accepts a task attempt id, and such, on the other hand JobHistory uses taskID-taskAttemptID, so the current situation is already inconsistent. I suspect if we introduce JobID, TaskID, TaskAttemptID as a replacement of the deprecated JobClient methods, then it will cause some trouble. I propose we stick with the naming schema in this patch, and later in a bigger issue change all the naming (internal and external) to respect {jobid, taskid, taskattemptid} (as a side note refactoring TaskID to TaskAttemptID and TaskInProgressID to TaskID will be mind-blowing) . This new issue should be a blocker for 0.18 so that users not using trunk will not see this (current) patch. Thoughts ?
        Hide
        Arun C Murthy added a comment -

        Enis, I agree with Owen - I think it would be great if this patch can take the first step and turn the tide towards the right nomenclature. Thanks again for your patience...

        Show
        Arun C Murthy added a comment - Enis, I agree with Owen - I think it would be great if this patch can take the first step and turn the tide towards the right nomenclature. Thanks again for your patience...
        Hide
        Arun C Murthy added a comment -

        +1 for Enis's take on changing public classes which aren't really part of the Map-Reduce's User Interface... we can change

        {Map|Reduce}

        TaskStatus etc. without any harm/fear.

        Show
        Arun C Murthy added a comment - +1 for Enis's take on changing public classes which aren't really part of the Map-Reduce's User Interface... we can change {Map|Reduce} TaskStatus etc. without any harm/fear.
        Hide
        Mukund Madhugiri added a comment -

        I ran Sort benchmark on 500 nodes with the patch ( id_v5.patch) and see these results:

        0.17.0 build: 85.3 mins
        patched trunk build: 80.3 mins

        Show
        Mukund Madhugiri added a comment - I ran Sort benchmark on 500 nodes with the patch ( id_v5.patch) and see these results: 0.17.0 build: 85.3 mins patched trunk build: 80.3 mins
        Hide
        Owen O'Malley added a comment -

        Which matches what I've seen in the profiler on the JobTracker, which is that we spend a lot of time looking up maps from string to X, for lots of different X's.

        Show
        Owen O'Malley added a comment - Which matches what I've seen in the profiler on the JobTracker, which is that we spend a lot of time looking up maps from string to X, for lots of different X's.
        Hide
        Enis Soztutar added a comment -

        A 5% improvement, wow !

        Thanks to the help from Arun, I've updated the patch. This version is a first attempt to build the naming convention.
        Now we have JobID, TaskID, TaskAttemptID classes. The prefixes for these are also changed to "job_", "task_" and "attempt", respectively. Surely any code which parsed these names from the strings will break, but thats exactly why we hereby introduce xxxID classes.

        As a side note, we keep tipId and taskId variable names, in some places. It will be good if they are also refactored, but I prefer to delay this in a later issue.

        Thanks for the benchmark, and the reviews.

        Show
        Enis Soztutar added a comment - A 5% improvement, wow ! Thanks to the help from Arun, I've updated the patch. This version is a first attempt to build the naming convention. Now we have JobID, TaskID, TaskAttemptID classes. The prefixes for these are also changed to "job_", "task_" and "attempt", respectively. Surely any code which parsed these names from the strings will break, but thats exactly why we hereby introduce xxxID classes. As a side note, we keep tipId and taskId variable names, in some places. It will be good if they are also refactored, but I prefer to delay this in a later issue. Thanks for the benchmark, and the reviews.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12381106/id_v6.patch
        against trunk revision 645773.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 12 new or modified tests.

        patch -1. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2343/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381106/id_v6.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 12 new or modified tests. patch -1. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2343/console This message is automatically generated.
        Hide
        Owen O'Malley added a comment -

        I removed the spurious changes to the TaskRunner import statements that were conflicting.

        Show
        Owen O'Malley added a comment - I removed the spurious changes to the TaskRunner import statements that were conflicting.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12381151/id_v7.patch
        against trunk revision 645773.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 12 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381151/id_v7.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 12 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2346/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        Resubmitting the patch. The test failures had something to do with MapReduce

        Show
        Devaraj Das added a comment - Resubmitting the patch. The test failures had something to do with MapReduce
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12381151/id_v7.patch
        against trunk revision 645773.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 12 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381151/id_v7.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 12 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2349/console This message is automatically generated.
        Hide
        Enis Soztutar added a comment -

        Updated version, introduces static xxxID.getxxxIDsPattern(...) methods to obtain regex patterns which match ID strings. Patch fixes the failed tests, which hand-crafted these patterns.

        Show
        Enis Soztutar added a comment - Updated version, introduces static xxxID.getxxxIDsPattern(...) methods to obtain regex patterns which match ID strings. Patch fixes the failed tests, which hand-crafted these patterns.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12381173/id_v8.patch
        against trunk revision 645773.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 15 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381173/id_v8.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 15 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2351/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        I just commited this. Thanks, Enis!

        Show
        Devaraj Das added a comment - I just commited this. Thanks, Enis!
        Hide
        Enis Soztutar added a comment -

        Changed the names of the classes in the release note, I will also fix and recommit CHANGES.txt.

        Show
        Enis Soztutar added a comment - Changed the names of the classes in the release note, I will also fix and recommit CHANGES.txt.
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #476 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/476/ )
        Hide
        Nigel Daley added a comment -

        For all the discussion about deprectatoin, etc, the patch for this issue has broken trunk! It no longer compiles on Hudson. The contrib eclipse plugin relied on an API that you considered non-Public (JobStatus and perhaps others). Given we have no well defined public API yet, I don't think you can assume that classes declared "public" aren't actually public. At a minimum, the release notes should indicated that methods were removed and not deprecated.

        Build failure:
        http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/476/console

        compile:
        [echo] contrib: eclipse-plugin
        [javac] Compiling 45 source files to /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-trunk/workspace/trunk/build/contrib/eclipse-plugin/classes
        [javac] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-trunk/workspace/trunk/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/server/HadoopServer.java:119: cannot find symbol
        [javac] symbol : method getJobId()
        [javac] location: class org.apache.hadoop.mapred.JobStatus
        [javac] String jobId = status.getJobId();
        [javac] ^
        [javac] Note: Some input files use or override a deprecated API.
        [javac] Note: Recompile with -Xlint:deprecation for details.
        [javac] Note: Some input files use unchecked or unsafe operations.
        [javac] Note: Recompile with -Xlint:unchecked for details.
        [javac] 1 error

        Can we get this fixed today?

        Show
        Nigel Daley added a comment - For all the discussion about deprectatoin, etc, the patch for this issue has broken trunk! It no longer compiles on Hudson. The contrib eclipse plugin relied on an API that you considered non-Public (JobStatus and perhaps others). Given we have no well defined public API yet, I don't think you can assume that classes declared "public" aren't actually public. At a minimum, the release notes should indicated that methods were removed and not deprecated. Build failure: http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-trunk/476/console compile: [echo] contrib: eclipse-plugin [javac] Compiling 45 source files to /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-trunk/workspace/trunk/build/contrib/eclipse-plugin/classes [javac] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-trunk/workspace/trunk/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/server/HadoopServer.java:119: cannot find symbol [javac] symbol : method getJobId() [javac] location: class org.apache.hadoop.mapred.JobStatus [javac] String jobId = status.getJobId(); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 1 error Can we get this fixed today?
        Hide
        Nigel Daley added a comment -

        I opened HADOOP-3338 to address the compilation failure on trunk

        Show
        Nigel Daley added a comment - I opened HADOOP-3338 to address the compilation failure on trunk
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #479 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/479/ )
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )

          People

          • Assignee:
            Enis Soztutar
            Reporter:
            Owen O'Malley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development