Issue Details (XML | Word | Printable)

Key: HADOOP-4276
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Owen O'Malley
Reporter: Owen O'Malley
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

The mapred.*ID classes are inefficient for hashCode and serialization

Created: 25/Sep/08 04:37 PM   Updated: 23/Apr/09 07:17 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.20.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works h4276.patch 2008-10-23 11:08 AM Owen O'Malley 29 kB
Text File Licensed for inclusion in ASF works h4276.patch 2008-10-20 05:36 AM Owen O'Malley 27 kB

Hadoop Flags: Reviewed
Resolution Date: 23/Oct/08 09:08 PM


 Description  « Hide
Currently the ID classes call toString and hash the resulting string rather than computing a hash directly.

The ID classes also create new instances of the higher level object in readFields (via read) rather than re-using the object via readFields.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Owen O'Malley added a comment - 20/Oct/08 05:36 AM
This patch:
1. Removes the string generation during hashing of id objects.
2. Reuses the id objects during readFields.
3. Defines a protected field for SEPARATOR and removes UNDERLINE.
4. Replace the toStringWOPrefix methods with addId that will reuse the same StringBuilder, which is more efficient.
5. Store the jtIdentifier as Text so that it doesn't need to be encoded for sending across RPC.

Hadoop QA added a comment - 21/Oct/08 01:00 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12392453/h4276.patch
against trunk revision 706417.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3501/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3501/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3501/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3501/console

This message is automatically generated.


Chris Douglas added a comment - 22/Oct/08 08:25 PM
This looks good. Just a few suggestions/nits:
  • In JobID:
    -      .append(jtIdentifier != null ? jtIdentifier : "[^_]*").append(UNDERSCORE)
    +      .append(jtIdentifier != null ? jtIdentifier : "[^_]*").append(SEPARATOR)
    

    the regexp "[^_]" should probably use the SEPARATOR constant

  • Where this replaces calls to ID factories with instances created in the cstr (JobProfile, TaskReport, TaskStatus, TaskCompletionEvent, TaskAttemptID, Task, KillTaskAction, KillJobAction, JobStatus) it might make sense to make the instance final
  • In TaskID:
    -      else return this.isMap ? -1 : 1;
    +      else {
    +        return this.isMap ? -1 : 1;
    +      }
    

    The else is redundant

  • addId reads like a mutator. Would addIdTo or appendIdTo make more sense?

Owen O'Malley added a comment - 23/Oct/08 11:08 AM
Addressed Chris' comments, except that I left the redundant else, which makes the return paths symmetric. I also used appendTo instead of appendIdTo, which seems more awkward.

Chris Douglas added a comment - 23/Oct/08 08:43 PM
+1

Owen O'Malley added a comment - 23/Oct/08 09:08 PM
I just committed this.

Hudson added a comment - 24/Oct/08 01:53 PM
Integrated in Hadoop-trunk #641 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/641/)
. Improve the hashing functions and deserialization of the
mapred ID classes. (omalley)