Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-773

LineRecordReader can report non-zero progress while it is processing a compressed stream

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: task
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Modifies LineRecordReader to report an approximate progress, instead of just returning 0, when using compressed streams.

      Description

      Currently, the LineRecordReader returns 0.0 from getProgress() for most inputs (since the "end" of the filesplit is set to Long.MAX_VALUE for compressed inputs). This can be improved to return a non-zero progress even for compressed streams (though it may not be very reflective of the actual progress).

      1. 773.3.patch
        10 kB
        Devaraj Das
      2. 773.2.patch
        6 kB
        Devaraj Das
      3. 773.patch
        4 kB
        Devaraj Das
      4. 773.patch
        4 kB
        Devaraj Das

        Activity

        Devaraj Das created issue -
        Hide
        Devaraj Das added a comment -

        Straightforward patch.

        Show
        Devaraj Das added a comment - Straightforward patch.
        Devaraj Das made changes -
        Field Original Value New Value
        Attachment 773.patch [ 12413999 ]
        Devaraj Das made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Arun C Murthy added a comment -

        I'm guessing this will help with speculative execution... am I right?

        Show
        Arun C Murthy added a comment - I'm guessing this will help with speculative execution... am I right?
        Hide
        Hong Tang added a comment -

        @devaraj, +1.
        @arun, yes.

        Show
        Hong Tang added a comment - @devaraj, +1. @arun, yes.
        Hide
        Owen O'Malley added a comment -

        Chris is asking for this to hold up on HADOOP-4010 to go in first.

        Show
        Owen O'Malley added a comment - Chris is asking for this to hold up on HADOOP-4010 to go in first.
        Hide
        Devaraj Das added a comment -

        Patch synced with the trunk. Talked with Chris offline and he is okay with this patch getting in now.

        Show
        Devaraj Das added a comment - Patch synced with the trunk. Talked with Chris offline and he is okay with this patch getting in now.
        Devaraj Das made changes -
        Attachment 773.patch [ 12414093 ]
        Hide
        Devaraj Das added a comment -

        Resubmitting to hudson

        Show
        Devaraj Das added a comment - Resubmitting to hudson
        Devaraj Das made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Devaraj Das made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12414093/773.patch
        against trunk revision 797362.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to cause Findbugs to fail.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/testReport/
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12414093/773.patch against trunk revision 797362. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/417/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        The attached patch has some fixes. I ran 'ant test' and test-patch locally and they both passed.

        Show
        Devaraj Das added a comment - The attached patch has some fixes. I ran 'ant test' and test-patch locally and they both passed.
        Devaraj Das made changes -
        Attachment 773.2.patch [ 12415500 ]
        Devaraj Das made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Devaraj Das added a comment -

        Hong, offline, suggested some changes to the patch, and the same is incorporated here. Also, Hong fixed a problem to do with the leaking of native direct buffer. This patch has both the fixes.
        ant test/test-patch passed. Some unrelated tests failed but there are jiras for them (MAPREDUCE-839, MAPREDUCE-882, MAPREDUCE-879).

        Show
        Devaraj Das added a comment - Hong, offline, suggested some changes to the patch, and the same is incorporated here. Also, Hong fixed a problem to do with the leaking of native direct buffer. This patch has both the fixes. ant test/test-patch passed. Some unrelated tests failed but there are jiras for them ( MAPREDUCE-839 , MAPREDUCE-882 , MAPREDUCE-879 ).
        Devaraj Das made changes -
        Attachment 773.3.patch [ 12416766 ]
        Hide
        Devaraj Das added a comment -

        Have manually tested this patch. Can't think of any good specific testcase for testing this.

        Show
        Devaraj Das added a comment - Have manually tested this patch. Can't think of any good specific testcase for testing this.
        Devaraj Das made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hong Tang added a comment -

        Patch looks good. +1.

        Show
        Hong Tang added a comment - Patch looks good. +1.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12416766/773.3.patch
        against trunk revision 805081.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416766/773.3.patch against trunk revision 805081. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/486/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        I just committed this.

        Show
        Devaraj Das added a comment - I just committed this.
        Devaraj Das made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Chris Douglas added a comment -

        This change does not preserve the existing behavior:

        -                            Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),
        -                                     maxLineLength));
        +                            Math.max(maxBytesToConsume(), maxLineLength));
        
        +  private int maxBytesToConsume() {
        +    return (isCompressedInput()) ? Integer.MAX_VALUE
        +                           : (int) Math.min(Integer.MAX_VALUE, (end - start));
        +  }
        

        Instead of end - pos, this uses end - start if less than maxint. This is a regression in HADOOP-3144

        Show
        Chris Douglas added a comment - This change does not preserve the existing behavior: - Math.max((int)Math.min(Integer.MAX_VALUE, end-pos), - maxLineLength)); + Math.max(maxBytesToConsume(), maxLineLength)); + private int maxBytesToConsume() { + return (isCompressedInput()) ? Integer.MAX_VALUE + : (int) Math.min(Integer.MAX_VALUE, (end - start)); + } Instead of end - pos , this uses end - start if less than maxint. This is a regression in HADOOP-3144
        Hide
        Hong Tang added a comment -

        My bad. I overlooked the difference between these two similar places.

        Show
        Hong Tang added a comment - My bad. I overlooked the difference between these two similar places.
        Hide
        Chris Douglas added a comment -
        Show
        Chris Douglas added a comment - Filed MAPREDUCE-946
        Jothi Padmanabhan made changes -
        Release Note Modifies LineRecordReader to report an approximate progress, instead of just returning 0, when using compressed streams.
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        27d 22h 51m 2 Devaraj Das 17/Aug/09 11:27
        Open Open Patch Available Patch Available
        2h 23m 3 Devaraj Das 17/Aug/09 13:28
        Patch Available Patch Available Resolved Resolved
        20h 26m 1 Devaraj Das 18/Aug/09 09:55
        Resolved Resolved Closed Closed
        371d 12h 19m 1 Tom White 24/Aug/10 22:14

          People

          • Assignee:
            Devaraj Das
            Reporter:
            Devaraj Das
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development