Issue Details (XML | Word | Printable)

Key: HADOOP-3828
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Sharad Agarwal
Reporter: Sharad Agarwal
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Write skipped records' bytes to DFS

Created: 25/Jul/08 11:05 AM   Updated: 08/Jul/09 04:52 PM
Component/s: None
Affects Version/s: None
Fix Version/s: 0.19.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 3828_v1.patch 2008-08-19 02:03 PM Sharad Agarwal 26 kB
Text File Licensed for inclusion in ASF works 3828_v2.patch 2008-08-20 10:40 AM Sharad Agarwal 27 kB
Text File Licensed for inclusion in ASF works 3828_v3.patch 2008-08-26 06:37 AM Sharad Agarwal 23 kB
Text File Licensed for inclusion in ASF works 3828_v4.patch 2008-08-28 01:32 PM Sharad Agarwal 25 kB
Issue Links:
Dependants
 

Hadoop Flags: Reviewed
Release Note: Skipped records can optionally be written to the HDFS. Refer org.apache.hadoop.mapred.SkipBadRecords.setSkipOutputPath for setting the output path.
Resolution Date: 29/Aug/08 08:20 AM


 Description  « Hide
This is an incremental step over HADOOP-153, which provides the base skipping functionality.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Sharad Agarwal added a comment - 19/Aug/08 02:03 PM
This works as follows:-
Write the skipped record (key,value) as SequenceFile.
By default the skipped records are written in the folder "_skip" in the output dir. This is configurable using SkipBadRecords.setSkipOutputPath

-The patch also fixes a corner case by initializing the variable "skipping" in TaskInProgress.
-Also it makes some changes in SortedRanges. Made it cloneable and fixed serialization of member variable.
-cleanup in MapTask by having a different implementation of RecordReader for normal mode (skipping=false)


Sharad Agarwal added a comment - 20/Aug/08 10:40 AM
Fixed the counter REDUCE_SKIPPED_RECORDS and added REDUCE_SKIPPED_GROUPS.

Hadoop QA added a comment - 22/Aug/08 02:38 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12388592/3828_v2.patch
against trunk revision 687868.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated 1 warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

-1 findbugs. The patch appears to introduce 4 new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/console

This message is automatically generated.


Sharad Agarwal added a comment - 26/Aug/08 06:37 AM
  • fixed findbugs warnings.
  • default location of skip directory is now {outpurDir}/_logs/skip
  • Incorporated offline comments by Devaraj to remove changes unrelated to this patch.

Hadoop QA added a comment - 27/Aug/08 03:39 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12388896/3828_v3.patch
against trunk revision 689333.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/console

This message is automatically generated.


Sharad Agarwal added a comment - 28/Aug/08 11:29 AM
Test failure not due to this patch. TestMiniMRDFSSort.testMapReduceSort is failing due to HADOOP-3950.

Devaraj Das added a comment - 28/Aug/08 01:00 PM
Minor comments:
1) The SortedRanges class should be made package private (though this is not directly related to the patch)
2) It may make sense to enable compression for the records written to the dfs
3) In some places the indentation needs to be fixed.

Sharad Agarwal added a comment - 28/Aug/08 01:32 PM
Incorporated Devaraj's comments.

Hadoop QA added a comment - 28/Aug/08 08:04 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389090/3828_v4.patch
against trunk revision 689913.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/console

This message is automatically generated.


Devaraj Das added a comment - 29/Aug/08 08:20 AM
I just committed this. Thanks, Sharad!

Hudson added a comment - 01/Sep/08 03:45 PM