Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2602

Allow setting of end-of-record delimiter for TextInputFormat (for the old API)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Since there are users who are still using the old MR API, it will be useful to modify the org.apache.hadoop.mapred.LineRecordReader and org.apache.hadoop.mapred.TextInputFormat to be able to use custom (user-specified) end-of-record delimiters. This will make use of the LineReader improvement introduced in HADOOP-7096 that enables the LineReader to break lines at user-specified delimiters.

      Note: MAPREDUCE-2254 already added this improvement to the new API (but not the old API).

      1. MAPREDUCE-2602_rev2.patch
        10 kB
        Ahmed Radwan
      2. MAPREDUCE-2602.patch
        10 kB
        Ahmed Radwan

        Activity

        Hide
        Ahmed Radwan added a comment -

        This batch is backward compatible.

        Show
        Ahmed Radwan added a comment - This batch is backward compatible.
        Hide
        Ahmed Radwan added a comment -

        This patch is backward compatible.

        Show
        Ahmed Radwan added a comment - This patch is backward compatible.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12482890/MAPREDUCE-2602.patch
        against trunk revision 1136261.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.cli.TestMRCLI
        org.apache.hadoop.fs.TestFileSystem

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/402//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/402//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/402//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482890/MAPREDUCE-2602.patch against trunk revision 1136261. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI org.apache.hadoop.fs.TestFileSystem -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/402//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/402//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/402//console This message is automatically generated.
        Hide
        Tom White added a comment -

        +1

        Show
        Tom White added a comment - +1
        Hide
        Ahmed Radwan added a comment -

        The Hadoop-QA test failures above are not related to the submitted patch.

        Show
        Ahmed Radwan added a comment - The Hadoop-QA test failures above are not related to the submitted patch.
        Hide
        Harsh J added a comment -

        The changes include constructor changes to LineReader class. Is that OK to go in without an Incompat mark? Its an internal class, but its been public so far.

        Rest of the changes appear all good

        Show
        Harsh J added a comment - The changes include constructor changes to LineReader class. Is that OK to go in without an Incompat mark? Its an internal class, but its been public so far. Rest of the changes appear all good
        Hide
        Ahmed Radwan added a comment -

        Many thanks Harsh

        I have updated the patch to keep old constructors unchanged, so it'll remain compatible.

        Show
        Ahmed Radwan added a comment - Many thanks Harsh I have updated the patch to keep old constructors unchanged, so it'll remain compatible.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12485529/MAPREDUCE-2602_rev2.patch
        against trunk revision 1143252.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.cli.TestMRCLI
        org.apache.hadoop.fs.TestFileSystem

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/439//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/439//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/439//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485529/MAPREDUCE-2602_rev2.patch against trunk revision 1143252. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI org.apache.hadoop.fs.TestFileSystem -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/439//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/439//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/439//console This message is automatically generated.
        Hide
        Tom White added a comment -

        +1 to the new patch.

        Show
        Tom White added a comment - +1 to the new patch.
        Hide
        Harsh J added a comment -

        +1 from me too, that change covered my comment!

        Show
        Harsh J added a comment - +1 from me too, that change covered my comment!
        Hide
        Todd Lipcon added a comment -

        Committed to trunk based on +1s above. Thanks, Ahmed!

        Show
        Todd Lipcon added a comment - Committed to trunk based on +1s above. Thanks, Ahmed!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #756 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/756/)
        MAPREDUCE-2602. Allow setting of end-of-record delimiter for TextInputFormat for the old API. Contributed by Ahmed Radwan.

        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1150926
        Files :

        • /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TextInputFormat.java
        • /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/LineRecordReader.java
        • /hadoop/common/trunk/mapreduce/CHANGES.txt
        • /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestLineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #756 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/756/ ) MAPREDUCE-2602 . Allow setting of end-of-record delimiter for TextInputFormat for the old API. Contributed by Ahmed Radwan. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1150926 Files : /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TextInputFormat.java /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/LineRecordReader.java /hadoop/common/trunk/mapreduce/CHANGES.txt /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestLineRecordReader.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #749 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/749/)
        MAPREDUCE-2602. Allow setting of end-of-record delimiter for TextInputFormat for the old API. Contributed by Ahmed Radwan.

        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1150926
        Files :

        • /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TextInputFormat.java
        • /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/LineRecordReader.java
        • /hadoop/common/trunk/mapreduce/CHANGES.txt
        • /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestLineRecordReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #749 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/749/ ) MAPREDUCE-2602 . Allow setting of end-of-record delimiter for TextInputFormat for the old API. Contributed by Ahmed Radwan. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1150926 Files : /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/TextInputFormat.java /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/LineRecordReader.java /hadoop/common/trunk/mapreduce/CHANGES.txt /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestLineRecordReader.java

          People

          • Assignee:
            Ahmed Radwan
            Reporter:
            Ahmed Radwan
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development