Hadoop Common
  1. Hadoop Common
  2. HADOOP-3829

Narrown down skipped records based on user acceptable value

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Introduced new config parameter org.apache.hadoop.mapred.SkipBadRecords.setMapperMaxSkipRecords to set range of records to be skipped in the neighborhood of a failed record.

      Description

      This is an incremental step over HADOOP-153.
      If the number of skipped records in the neighborhood of a bad record are not acceptable to the user, then narrow down the skipped range to the user acceptable value.

      1. 3829_v4.patch
        45 kB
        Sharad Agarwal
      2. 3829_v3.patch
        36 kB
        Sharad Agarwal
      3. 3829_v2.patch
        36 kB
        Sharad Agarwal
      4. 3829_v1.patch
        23 kB
        Sharad Agarwal
      5. 3829_v1.153_7.patch
        19 kB
        Sharad Agarwal

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #611 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/611/ )
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Sharad!

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Sharad!
          Hide
          Sharad Agarwal added a comment -

          TestFileAppend2.testComplexAppend has failed on Hudson which is unrelated to this patch.

          Show
          Sharad Agarwal added a comment - TestFileAppend2.testComplexAppend has failed on Hudson which is unrelated to this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12390255/3829_v4.patch
          against trunk revision 696525.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3293/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3293/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3293/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3293/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12390255/3829_v4.patch against trunk revision 696525. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3293/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3293/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3293/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3293/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          ant test passed on my machine.

          Show
          Sharad Agarwal added a comment - ant test passed on my machine.
          Hide
          Sharad Agarwal added a comment -

          test-patch passed on my machine.

          [exec] +1 overall.

          [exec] +1 @author. The patch does not contain any @author tags.

          [exec] +1 tests included. The patch appears to include 11 new or modified tests.

          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.

          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.

          Show
          Sharad Agarwal added a comment - test-patch passed on my machine. [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 11 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          Hide
          Sharad Agarwal added a comment -

          Fixed an issue in ReduceTask#SkippingReduceValuesIterator.
          Added more documentation to SkipBadRecords.
          Made writing of skip records optional.
          Incorporated Devaraj's offline comment to remove the SkipBadRecords.ENABLED flag as it is now redundant after the addition of MAPPER_MAX_SKIP_RECORDS/REDUCER_MAX_SKIP_GROUPS.

          Show
          Sharad Agarwal added a comment - Fixed an issue in ReduceTask#SkippingReduceValuesIterator. Added more documentation to SkipBadRecords. Made writing of skip records optional. Incorporated Devaraj's offline comment to remove the SkipBadRecords.ENABLED flag as it is now redundant after the addition of MAPPER_MAX_SKIP_RECORDS/REDUCER_MAX_SKIP_GROUPS.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12389656/3829_v3.patch
          against trunk revision 692996.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3205/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3205/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3205/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3205/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389656/3829_v3.patch against trunk revision 692996. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3205/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3205/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3205/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3205/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          updated with the trunk.

          Show
          Sharad Agarwal added a comment - updated with the trunk.
          Hide
          Sharad Agarwal added a comment -

          fairly tested patch. few additional things:

          • moved counters from Counters.java to SkipBadRecords.java as these are specific to skip feature.
          • fixed hasNext in SortedRanges.SkipRangeIterator.
          • skipped records are not written to HDFS, if it is a test attempt (test attempt is to figure out whether a range is good or bad. In this only records in test range are passed to the mapper/reducer, others are skipped).
          • recordreader.next not called beyond the test range during test attempt.
          • renamed failedRanges to skipRanges in Task.java
          • added config params to hadoop-default
          Show
          Sharad Agarwal added a comment - fairly tested patch. few additional things: moved counters from Counters.java to SkipBadRecords.java as these are specific to skip feature. fixed hasNext in SortedRanges.SkipRangeIterator. skipped records are not written to HDFS, if it is a test attempt (test attempt is to figure out whether a range is good or bad. In this only records in test range are passed to the mapper/reducer, others are skipped). recordreader.next not called beyond the test range during test attempt. renamed failedRanges to skipRanges in Task.java added config params to hadoop-default
          Hide
          Sharad Agarwal added a comment -

          Attaching the working patch, while I continue to test.

          Show
          Sharad Agarwal added a comment - Attaching the working patch, while I continue to test.
          Hide
          Sharad Agarwal added a comment -

          This patch depends on the patch from HADOOP-153. Please apply 153_7.patch before applying this one.

          The approach has been discussed earlier in HADOOP-153 as well. Here is the brief:

          Defines user configurable MAPPER_MAX_SKIP_RECORDS/MAPPER_REDUCE_SKIP_RECORDS -> acceptable skipped records in the neighborhood of a bad record.
          If skipped range is greater than this threshold, the task will try to narrow down the skipped range using a binary search kind of algorithm during task re-executions till this threshold is met or all task attempts are exhausted. The skipped range is divided into two halves and only one half get executed. Based on the subsequent failure, it figures out which half contains the bad record.

          Show
          Sharad Agarwal added a comment - This patch depends on the patch from HADOOP-153 . Please apply 153_7.patch before applying this one. The approach has been discussed earlier in HADOOP-153 as well. Here is the brief: Defines user configurable MAPPER_MAX_SKIP_RECORDS/MAPPER_REDUCE_SKIP_RECORDS -> acceptable skipped records in the neighborhood of a bad record. If skipped range is greater than this threshold, the task will try to narrow down the skipped range using a binary search kind of algorithm during task re-executions till this threshold is met or all task attempts are exhausted. The skipped range is divided into two halves and only one half get executed. Based on the subsequent failure, it figures out which half contains the bad record.

            People

            • Assignee:
              Sharad Agarwal
              Reporter:
              Sharad Agarwal
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development