Hadoop Common
  1. Hadoop Common
  2. HADOOP-2826

FileSplit.getFile(), LineRecordReader. readLine() need to be removed

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.17.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      The deprecated methods, public File org.apache.hadoop.mapred.FileSplit.getFile() and
        public static long org.apache.hadoop.mapred.LineRecordReader.readLine(InputStream in, OutputStream out)
      are removed.
      The constructor org.apache.hadoop.mapred.LineRecordReader.LineReader(InputStream in, Configuration conf) 's visibility is made public.
      The signature of the public org.apache.hadoop.streaming.UTF8ByteArrayUtils.readLIne(InputStream) method is changed to UTF8ByteArrayUtils.readLIne(LineReader, Text). Since the old signature is not deprecated, any code using the old method must be changed to use the new method.
      Show
      The deprecated methods, public File org.apache.hadoop.mapred.FileSplit.getFile() and   public static long org.apache.hadoop.mapred.LineRecordReader.readLine(InputStream in, OutputStream out) are removed. The constructor org.apache.hadoop.mapred.LineRecordReader.LineReader(InputStream in, Configuration conf) 's visibility is made public. The signature of the public org.apache.hadoop.streaming.UTF8ByteArrayUtils.readLIne(InputStream) method is changed to UTF8ByteArrayUtils.readLIne(LineReader, Text). Since the old signature is not deprecated, any code using the old method must be changed to use the new method.

      Description

      The methods FileSplit.getFile(), LineRecordReader. readLine() need to be removed as they are deprecated.

      1. patch-2826.txt
        12 kB
        Amareshwari Sriramadasu
      2. patch-2826.txt
        13 kB
        Amareshwari Sriramadasu
      3. patch-2826.txt
        14 kB
        Amareshwari Sriramadasu
      4. patch-2826.txt
        14 kB
        Amareshwari Sriramadasu

        Activity

        Hide
        Amareshwari Sriramadasu added a comment -

        Here is patch removing FileSplit.getFile() and LineRecordReader.readLine()

        Show
        Amareshwari Sriramadasu added a comment - Here is patch removing FileSplit.getFile() and LineRecordReader.readLine()
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12378944/patch-2826.txt
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 6 new or modified tests.

        patch -1. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2102/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378944/patch-2826.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. patch -1. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2102/console This message is automatically generated.
        Hide
        Mahadev konar added a comment -

        instead of public static byte [] readLine(LineReader lineReader)

        why not pass a text object ?

        public static readLine(LineReader lineReader, Text t)

        read into t
        }

        the call would be something like ....
        t.clear()
        call readline(reader, t)

        that would save creating object Text for each readline invocation?

        Show
        Mahadev konar added a comment - instead of public static byte [] readLine(LineReader lineReader) why not pass a text object ? public static readLine(LineReader lineReader, Text t) read into t } the call would be something like .... t.clear() call readline(reader, t) that would save creating object Text for each readline invocation?
        Hide
        Amareshwari Sriramadasu added a comment -

        Canceling patch to address Mahadev's comment.

        Show
        Amareshwari Sriramadasu added a comment - Canceling patch to address Mahadev's comment.
        Hide
        Amareshwari Sriramadasu added a comment -

        Here is patch addressing Mahadev's suggestion.

        Show
        Amareshwari Sriramadasu added a comment - Here is patch addressing Mahadev's suggestion.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12379102/patch-2826.txt
        against trunk revision 643282.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 6 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2119/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2119/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2119/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2119/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379102/patch-2826.txt against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2119/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2119/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2119/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2119/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        Cancelling the patch, because In the patch pipemapred does a arrayCopy to get byteArray of the Text's length. But Text is doing a arrayCopy for readLine. This involves two array copies for each readline call of streaming.

        Show
        Amareshwari Sriramadasu added a comment - Cancelling the patch, because In the patch pipemapred does a arrayCopy to get byteArray of the Text's length. But Text is doing a arrayCopy for readLine. This involves two array copies for each readline call of streaming.
        Hide
        Amareshwari Sriramadasu added a comment -

        This patch removes the arrayCopy added in earlier patch and byte array is passed along with its for the split method. This solution should be
        better than the earlier since we are avoiding the additional arraycopy.

        Show
        Amareshwari Sriramadasu added a comment - This patch removes the arrayCopy added in earlier patch and byte array is passed along with its for the split method. This solution should be better than the earlier since we are avoiding the additional arraycopy.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12379123/patch-2826.txt
        against trunk revision 643282.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 6 new or modified tests.

        javadoc -1. The javadoc tool appears to have generated 1 warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2122/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2122/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2122/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2122/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379123/patch-2826.txt against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2122/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2122/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2122/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2122/console This message is automatically generated.
        Hide
        Mahadev konar added a comment -

        +1 . code looks good.

        Show
        Mahadev konar added a comment - +1 . code looks good.
        Hide
        Amareshwari Sriramadasu added a comment -

        Removed javadoc warning.

        Show
        Amareshwari Sriramadasu added a comment - Removed javadoc warning.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12379229/patch-2826.txt
        against trunk revision 643282.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 6 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2141/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2141/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2141/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2141/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379229/patch-2826.txt against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2141/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2141/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2141/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2141/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Amareshwari!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Amareshwari!
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #451 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/451/ )
        Hide
        Joydeep Sen Sarma added a comment -

        i was pulling these changes into our production environment. quick comment:

        we copy over an entire line (from readLine) and then we break it into two strings by splitting on tab. So there is an extra scan of the input data and an extra copy based on splitting by tab.

        instead if we generalized LineReader to instead read until it hits a delimiter - then we can do it with one less scan and copy. Something like:

        byte [] tabDelimiter = new byte [1]; tabDelimiter[0] = '\t';
        byte [] newlineDelimiter = new byte[2]; newlineDelimiter[0] = '\n'; newlineDelimiter[1] = '\r';

        while()

        { lineReader.setDelimiter(tabDelimiter); lineReader.readLine(key); lineReader.setDelimiter(newlineDelimiter); lineReader.readLine(value); }

        getting rid of one copy and scan would be pretty big!

        Show
        Joydeep Sen Sarma added a comment - i was pulling these changes into our production environment. quick comment: we copy over an entire line (from readLine) and then we break it into two strings by splitting on tab. So there is an extra scan of the input data and an extra copy based on splitting by tab. instead if we generalized LineReader to instead read until it hits a delimiter - then we can do it with one less scan and copy. Something like: byte [] tabDelimiter = new byte [1] ; tabDelimiter [0] = '\t'; byte [] newlineDelimiter = new byte [2] ; newlineDelimiter [0] = '\n'; newlineDelimiter [1] = '\r'; while() { lineReader.setDelimiter(tabDelimiter); lineReader.readLine(key); lineReader.setDelimiter(newlineDelimiter); lineReader.readLine(value); } getting rid of one copy and scan would be pretty big!

          People

          • Assignee:
            Amareshwari Sriramadasu
            Reporter:
            Amareshwari Sriramadasu
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development