Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Skipped records can optionally be written to the HDFS. Refer org.apache.hadoop.mapred.SkipBadRecords.setSkipOutputPath for setting the output path.

      Description

      This is an incremental step over HADOOP-153, which provides the base skipping functionality.

      1. 3828_v1.patch
        26 kB
        Sharad Agarwal
      2. 3828_v2.patch
        27 kB
        Sharad Agarwal
      3. 3828_v3.patch
        23 kB
        Sharad Agarwal
      4. 3828_v4.patch
        25 kB
        Sharad Agarwal

        Issue Links

          Activity

          Hide
          Sharad Agarwal added a comment -

          This works as follows:-
          Write the skipped record (key,value) as SequenceFile.
          By default the skipped records are written in the folder "_skip" in the output dir. This is configurable using SkipBadRecords.setSkipOutputPath

          -The patch also fixes a corner case by initializing the variable "skipping" in TaskInProgress.
          -Also it makes some changes in SortedRanges. Made it cloneable and fixed serialization of member variable.
          -cleanup in MapTask by having a different implementation of RecordReader for normal mode (skipping=false)

          Show
          Sharad Agarwal added a comment - This works as follows:- Write the skipped record (key,value) as SequenceFile. By default the skipped records are written in the folder "_skip" in the output dir. This is configurable using SkipBadRecords.setSkipOutputPath -The patch also fixes a corner case by initializing the variable "skipping" in TaskInProgress. -Also it makes some changes in SortedRanges. Made it cloneable and fixed serialization of member variable. -cleanup in MapTask by having a different implementation of RecordReader for normal mode (skipping=false)
          Hide
          Sharad Agarwal added a comment -

          Fixed the counter REDUCE_SKIPPED_RECORDS and added REDUCE_SKIPPED_GROUPS.

          Show
          Sharad Agarwal added a comment - Fixed the counter REDUCE_SKIPPED_RECORDS and added REDUCE_SKIPPED_GROUPS.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12388592/3828_v2.patch
          against trunk revision 687868.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 4 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12388592/3828_v2.patch against trunk revision 687868. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3083/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -
          • fixed findbugs warnings.
          • default location of skip directory is now {outpurDir}

            /_logs/skip

          • Incorporated offline comments by Devaraj to remove changes unrelated to this patch.
          Show
          Sharad Agarwal added a comment - fixed findbugs warnings. default location of skip directory is now {outpurDir} /_logs/skip Incorporated offline comments by Devaraj to remove changes unrelated to this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12388896/3828_v3.patch
          against trunk revision 689333.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12388896/3828_v3.patch against trunk revision 689333. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3114/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          Test failure not due to this patch. TestMiniMRDFSSort.testMapReduceSort is failing due to HADOOP-3950.

          Show
          Sharad Agarwal added a comment - Test failure not due to this patch. TestMiniMRDFSSort.testMapReduceSort is failing due to HADOOP-3950 .
          Hide
          Devaraj Das added a comment -

          Minor comments:
          1) The SortedRanges class should be made package private (though this is not directly related to the patch)
          2) It may make sense to enable compression for the records written to the dfs
          3) In some places the indentation needs to be fixed.

          Show
          Devaraj Das added a comment - Minor comments: 1) The SortedRanges class should be made package private (though this is not directly related to the patch) 2) It may make sense to enable compression for the records written to the dfs 3) In some places the indentation needs to be fixed.
          Hide
          Sharad Agarwal added a comment -

          Incorporated Devaraj's comments.

          Show
          Sharad Agarwal added a comment - Incorporated Devaraj's comments.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389090/3828_v4.patch
          against trunk revision 689913.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389090/3828_v4.patch against trunk revision 689913. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3138/console This message is automatically generated.
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Sharad!

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Sharad!
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #589 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/589/ )

            People

            • Assignee:
              Sharad Agarwal
              Reporter:
              Sharad Agarwal
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development