Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-773

LineRecordReader can report non-zero progress while it is processing a compressed stream

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: task
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Modifies LineRecordReader to report an approximate progress, instead of just returning 0, when using compressed streams.

      Description

      Currently, the LineRecordReader returns 0.0 from getProgress() for most inputs (since the "end" of the filesplit is set to Long.MAX_VALUE for compressed inputs). This can be improved to return a non-zero progress even for compressed streams (though it may not be very reflective of the actual progress).

      1. 773.2.patch
        6 kB
        Devaraj Das
      2. 773.3.patch
        10 kB
        Devaraj Das
      3. 773.patch
        4 kB
        Devaraj Das
      4. 773.patch
        4 kB
        Devaraj Das

        Activity

        Hide
        Chris Douglas added a comment -
        Show
        Chris Douglas added a comment - Filed MAPREDUCE-946
        Hide
        Hong Tang added a comment -

        My bad. I overlooked the difference between these two similar places.

        Show
        Hong Tang added a comment - My bad. I overlooked the difference between these two similar places.
        Hide
        Chris Douglas added a comment -

        This change does not preserve the existing behavior:

        -                            Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),
        -                                     maxLineLength));
        +                            Math.max(maxBytesToConsume(), maxLineLength));
        
        +  private int maxBytesToConsume() {
        +    return (isCompressedInput()) ? Integer.MAX_VALUE
        +                           : (int) Math.min(Integer.MAX_VALUE, (end - start));
        +  }
        

        Instead of end - pos, this uses end - start if less than maxint. This is a regression in HADOOP-3144

        Show
        Chris Douglas added a comment - This change does not preserve the existing behavior: - Math.max((int)Math.min(Integer.MAX_VALUE, end-pos), - maxLineLength)); + Math.max(maxBytesToConsume(), maxLineLength)); + private int maxBytesToConsume() { + return (isCompressedInput()) ? Integer.MAX_VALUE + : (int) Math.min(Integer.MAX_VALUE, (end - start)); + } Instead of end - pos , this uses end - start if less than maxint. This is a regression in HADOOP-3144
        Hide
        Devaraj Das added a comment -

        I just committed this.

        Show
        Devaraj Das added a comment - I just committed this.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12416766/773.3.patch
        against trunk revision 805081.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416766/773.3.patch against trunk revision 805081. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/console This message is automatically generated.
        Hide
        Hong Tang added a comment -

        Patch looks good. +1.

        Show
        Hong Tang added a comment - Patch looks good. +1.
        Hide
        Devaraj Das added a comment -

        Have manually tested this patch. Can't think of any good specific testcase for testing this.

        Show
        Devaraj Das added a comment - Have manually tested this patch. Can't think of any good specific testcase for testing this.
        Hide
        Devaraj Das added a comment -

        Hong, offline, suggested some changes to the patch, and the same is incorporated here. Also, Hong fixed a problem to do with the leaking of native direct buffer. This patch has both the fixes.
        ant test/test-patch passed. Some unrelated tests failed but there are jiras for them (MAPREDUCE-839, MAPREDUCE-882, MAPREDUCE-879).

        Show
        Devaraj Das added a comment - Hong, offline, suggested some changes to the patch, and the same is incorporated here. Also, Hong fixed a problem to do with the leaking of native direct buffer. This patch has both the fixes. ant test/test-patch passed. Some unrelated tests failed but there are jiras for them ( MAPREDUCE-839 , MAPREDUCE-882 , MAPREDUCE-879 ).
        Hide
        Devaraj Das added a comment -

        The attached patch has some fixes. I ran 'ant test' and test-patch locally and they both passed.

        Show
        Devaraj Das added a comment - The attached patch has some fixes. I ran 'ant test' and test-patch locally and they both passed.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12414093/773.patch
        against trunk revision 797362.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to cause Findbugs to fail.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/testReport/
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12414093/773.patch against trunk revision 797362. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        Resubmitting to hudson

        Show
        Devaraj Das added a comment - Resubmitting to hudson
        Hide
        Devaraj Das added a comment -

        Patch synced with the trunk. Talked with Chris offline and he is okay with this patch getting in now.

        Show
        Devaraj Das added a comment - Patch synced with the trunk. Talked with Chris offline and he is okay with this patch getting in now.
        Hide
        Owen O'Malley added a comment -

        Chris is asking for this to hold up on HADOOP-4010 to go in first.

        Show
        Owen O'Malley added a comment - Chris is asking for this to hold up on HADOOP-4010 to go in first.
        Hide
        Hong Tang added a comment -

        @devaraj, +1.
        @arun, yes.

        Show
        Hong Tang added a comment - @devaraj, +1. @arun, yes.
        Hide
        Arun C Murthy added a comment -

        I'm guessing this will help with speculative execution... am I right?

        Show
        Arun C Murthy added a comment - I'm guessing this will help with speculative execution... am I right?
        Hide
        Devaraj Das added a comment -

        Straightforward patch.

        Show
        Devaraj Das added a comment - Straightforward patch.

          People

          • Assignee:
            Devaraj Das
            Reporter:
            Devaraj Das
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development