Issue Details (XML | Word | Printable)

Key: HADOOP-4699
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Chris Douglas
Reporter: Chris Douglas
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Change TaskTracker.MapOutputServlet to send only the IFile segment, validate checksum in Reduce

Created: 20/Nov/08 10:38 PM   Updated: 23/Apr/09 07:17 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.20.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 4699-0.patch 2008-12-03 09:50 AM Chris Douglas 6 kB
Issue Links:
Blocker

Hadoop Flags: Reviewed
Resolution Date: 12/Dec/08 07:58 PM


 Description  « Hide
Instead of validating the checksum of the IFile segment in MapOutputServlet, validation may be left to the reduce. While failures may not be detected until late in the reduce, the throughput and CPU improvements should make up for it in the average case.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Jothi Padmanabhan added a comment - 21/Nov/08 04:03 AM
This makes sense. However, I think we should do a measurement of the performance overhead experimentally to validate the theory. If there is no significant overhead due to checksum validation, we might be better off doing it in the Servlet as well to help prevent transmission of the corrupted output. No?

Jothi Padmanabhan added a comment - 21/Nov/08 04:18 AM
Disregard my previous comment, if we use NIO, we would not be reading the file at the servlet all and so cannot validate the checksum.

Chris Douglas added a comment - 03/Dec/08 09:50 AM
Removes the checksum validation from MapOutputServlet

Hadoop QA added a comment - 08/Dec/08 09:12 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12395171/4699-0.patch
against trunk revision 724229.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3688/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3688/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3688/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3688/console

This message is automatically generated.


Chris Douglas added a comment - 08/Dec/08 10:12 PM
HADOOP-4688 validates this patch. It wasn't committed before Hudson ran, but it passes on my machine.

Jothi Padmanabhan added a comment - 10/Dec/08 04:29 AM
Changes look fine.
I see that you have removed the Debug statements introduced by Arun for HADOOP-3647. Is it no longer required?
And curious, did you benchmark this patch against trunk for performance?

Chris Douglas added a comment - 10/Dec/08 06:41 AM

I see that you have removed the Debug statements introduced by Arun for HADOOP-3647. Is it no longer required?

HADOOP-4754 was going to address this directly, but it would have conflicted with other patches, so I abandoned it. If you think it merits a separate issue, then I can rework both patches.

And curious, did you benchmark this patch against trunk for performance?

No. It probably won't have a measurable impact.


Jothi Padmanabhan added a comment - 10/Dec/08 06:49 AM

If you think it merits a separate issue, then I can rework both patches.

If the problem underlying HADOOP-3647 is resolved, I am fine with just removing the debug statements here itself.


Jothi Padmanabhan added a comment - 10/Dec/08 08:58 AM
+1

Chris Douglas added a comment - 12/Dec/08 07:58 PM
I just committed this.