Issue Details (XML | Word | Printable)

Key: HADOOP-4649
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Chris Douglas
Reporter: Chris Douglas
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Improve abstraction for spill indices

Created: 13/Nov/08 03:18 AM   Updated: 08/Jul/09 04:53 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.20.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 4649-0.patch 2008-11-13 03:19 AM Chris Douglas 33 kB
Text File Licensed for inclusion in ASF works 4649-1.patch 2008-11-30 12:52 AM Chris Douglas 33 kB
Text File Licensed for inclusion in ASF works 4649-2.patch 2008-12-01 07:42 PM Chris Douglas 33 kB

Hadoop Flags: Reviewed
Resolution Date: 02/Dec/08 06:49 AM


 Description  « Hide
In support of changing checksum handling as part of the migration to Jetty6, some of the spill code would be easier to reason about with a different abstraction.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Hadoop QA added a comment - 13/Nov/08 07:39 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393834/4649-0.patch
against trunk revision 713723.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3585/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3585/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3585/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3585/console

This message is automatically generated.


Chris Douglas added a comment - 30/Nov/08 12:52 AM
Merged with trunk
     [exec] +1 overall.  

     [exec]     +1 @author.  The patch does not contain any @author tags.

     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.

     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.

     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.

     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

Jothi Padmanabhan added a comment - 01/Dec/08 11:49 AM
Looks pretty good.
  • Is there a reason why you do not use IFileInputStream/IFileOuptutStream in SpillRecord? The CRC addition/vadlidation code is very similar to that in IFIleInput/Output Streams. If at a later stage, we decide to use a different checksum other than CRC32 in IFile, we should modify here as well, if we just want to be consistent. Agreed that it will not break anything if this continues to use CRC32, but it might just be better to use the same as in IFileInput/Output stream, unless there is a compelling reason otherwise.
  • We do not expect this change to have any performance impact, but could you verify just in case, if you have not already.
  • A few other minor points
  1. SpillRecord.java: import org.apache.hadoop.fs.LocalFileSystem is not used
  2. IndexCache.java import java.io.FileNotFoundException is not used
  3. MapTask.java:1230, rawSegmentLength is not used anywhere

Chris Douglas added a comment - 01/Dec/08 07:42 PM
Thanks for the review.

Is there a reason why you do not use IFileInputStream/IFileOuptutStream in SpillRecord?

Because it's not an IFile? You're right, the java.util.zip.Checked*Streams and the current IFile*Streams are very close, but the SpillRecords have different access patterns, average sizes, etc. compared to IFiles and I could think of no particular reason why changes to one should influence the other. If the checksum algorithm for IFiles were to change, it would likely be something like Adler32- faster than CRC32, but insufficient for small input, like SpillRecords. Most of the modifications already discussed for IFile that might go into IFile*Stream- record count, for example- would be awkward adaptations for SpillRecords. Separating the two was one of the motivations for the patch. Did you have a use case in mind or is there another benefit to using the IFile*Streams?

We do not expect this change to have any performance impact, but could you verify just in case, if you have not already.

Sure. I'll bounce this if the results are adverse, but I think we can assume that any impact isn't measurable.

Attached a patch removing the dead code you identified. The previous test/test-patch results remain valid.


Jothi Padmanabhan added a comment - 02/Dec/08 05:15 AM
Currently, the IFile*Streams do exactly what you want for SpillRecords - namely create/validate a checksum at the end of the file. So, to avoid duplication of code, we could just use them here.
If at a later stage, if we find that the CRC algorithm for IFIle is affecting the performance for SpillRecord or we change these streams when modifying the IFile layout, we could revisit this, no?

I am fine with the existing patch but would like to avoid duplication of the code, if we could. I am +1 otherwise.


Chris Douglas added a comment - 02/Dec/08 06:49 AM
I just committed this.

Hudson added a comment - 03/Dec/08 02:32 PM