Hadoop Common
  1. Hadoop Common
  2. HADOOP-8654

TextInputFormat delimiter bug:- Input Text portion ends with & Delimiter starts with same char/char sequence

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.204.0, 1.0.3, 0.21.0, 2.0.0-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: util
    • Labels:
    • Environment:

      Linux

    • Target Version/s:
    • Tags:
      TextInputFormat record delimiter

      Description

      TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and the remaining input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter.

      eg delimiter ="record";
      and Text =" record 1:- name = Gelesh e mail = gelesh.hadoop@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name .... "

      Here string "=Bangalorrecord 3: " satisfy two conditions
      1) contains the delimiter "record"
      2) The character / character sequence immediately before the delimiter (ie ' r ') matches with first character (or character sequence ) of delimiter. (ie "=Bangalor" ends with and Delimiter starts with same character/char sequence 'r' ),

      Here the delimiter is not encountered by the program resulting in improper value text in map that contains the delimiter

      1. MAPREDUCE-4512.txt
        0.7 kB
        Gelesh
      2. HADOOP-8654.patch
        3 kB
        Jason Lowe

        Activity

        Hide
        Gelesh added a comment -

        Thanks Arun...

        Show
        Gelesh added a comment - Thanks Arun...
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1169 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1169/)
        HADOOP-8654. TextInputFormat delimiter bug (Gelesh and Jason Lowe via bobby) (Revision 1373859)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373859
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLineReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1169 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1169/ ) HADOOP-8654 . TextInputFormat delimiter bug (Gelesh and Jason Lowe via bobby) (Revision 1373859) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373859 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLineReader.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1137 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1137/)
        HADOOP-8654. TextInputFormat delimiter bug (Gelesh and Jason Lowe via bobby) (Revision 1373859)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373859
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLineReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1137 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1137/ ) HADOOP-8654 . TextInputFormat delimiter bug (Gelesh and Jason Lowe via bobby) (Revision 1373859) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373859 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLineReader.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2617 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2617/)
        HADOOP-8654. TextInputFormat delimiter bug (Gelesh and Jason Lowe via bobby) (Revision 1373859)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373859
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLineReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2617 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2617/ ) HADOOP-8654 . TextInputFormat delimiter bug (Gelesh and Jason Lowe via bobby) (Revision 1373859) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373859 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLineReader.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2652 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2652/)
        HADOOP-8654. TextInputFormat delimiter bug (Gelesh and Jason Lowe via bobby) (Revision 1373859)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373859
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLineReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2652 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2652/ ) HADOOP-8654 . TextInputFormat delimiter bug (Gelesh and Jason Lowe via bobby) (Revision 1373859) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373859 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLineReader.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2587 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2587/)
        HADOOP-8654. TextInputFormat delimiter bug (Gelesh and Jason Lowe via bobby) (Revision 1373859)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373859
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLineReader.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2587 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2587/ ) HADOOP-8654 . TextInputFormat delimiter bug (Gelesh and Jason Lowe via bobby) (Revision 1373859) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373859 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestLineReader.java
        Hide
        Robert Joseph Evans added a comment -

        Thanks Gelesh and Jason, +1

        I put this into trunk and branch-2

        Show
        Robert Joseph Evans added a comment - Thanks Gelesh and Jason, +1 I put this into trunk and branch-2
        Hide
        Gelesh added a comment -

        Thanks Jason Lowe , I have run the test case you have uploaded. The error and the solution holds good.
        Hope we can close this.

        Show
        Gelesh added a comment - Thanks Jason Lowe , I have run the test case you have uploaded. The error and the solution holds good. Hope we can close this.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12541140/HADOOP-8654.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1311//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1311//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541140/HADOOP-8654.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1311//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1311//console This message is automatically generated.
        Hide
        Jason Lowe added a comment -

        The confusion is, this error is inPut file based, and we need to supply a error case based input.

        We don't need a full-blown MapReduce job to perform a unit test of the fix. The issue is localized to LineReader, so let's write a unit test for that. Rather than using a file as input, we can feed it a string of characters written into the test code directly.

        I've attached an updated patch with a testcase.

        Show
        Jason Lowe added a comment - The confusion is, this error is inPut file based, and we need to supply a error case based input. We don't need a full-blown MapReduce job to perform a unit test of the fix. The issue is localized to LineReader, so let's write a unit test for that. Rather than using a file as input, we can feed it a string of characters written into the test code directly. I've attached an updated patch with a testcase.
        Hide
        Gelesh added a comment -

        Since by my mistake , I clicked on Resolved button, I have reopned the issue.
        To change the Status to Patch Available I am re submiting the same patch,
        I appologize

        Show
        Gelesh added a comment - Since by my mistake , I clicked on Resolved button, I have reopned the issue. To change the Status to Patch Available I am re submiting the same patch, I appologize
        Hide
        Gelesh added a comment -

        I was searching for resolved issue,
        And for that I clicked on Resolved issue.
        My appologise

        Show
        Gelesh added a comment - I was searching for resolved issue, And for that I clicked on Resolved issue. My appologise
        Hide
        Gelesh added a comment -

        I could write a Map Reduce, for testing
        with the below code in Map Reduce Driver

        Path inputDirectory = new Path("TestDirectory", "input");
        Path file = new Path(inputDirectory, "InputFile.txt");
        Writer writer = new OutputStreamWriter(localFs.create(file));
        writer.write("The Reruired Very Big Input String"); // Fingers crossed

        Path outFile = new Path(outputTestDirectory, "part-r-00000");
        Reader reader = new InputStreamReader(localFs.open(outFile));

        Is this okay ?

        Show
        Gelesh added a comment - I could write a Map Reduce, for testing with the below code in Map Reduce Driver Path inputDirectory = new Path("TestDirectory", "input"); Path file = new Path(inputDirectory, "InputFile.txt"); Writer writer = new OutputStreamWriter(localFs.create(file)); writer.write("The Reruired Very Big Input String"); // Fingers crossed Path outFile = new Path(outputTestDirectory, "part-r-00000"); Reader reader = new InputStreamReader(localFs.open(outFile)); Is this okay ?
        Hide
        Gelesh added a comment -

        Could you please share a Java Test file or a link to refer the same.
        The confusion is, this error is inPut file based, and we need to supply a error case based input.
        A link for the existing test case, which is as per the would help, which follows new the test case rules as per Apache-wiki

        Show
        Gelesh added a comment - Could you please share a Java Test file or a link to refer the same. The confusion is, this error is inPut file based, and we need to supply a error case based input. A link for the existing test case, which is as per the would help, which follows new the test case rules as per Apache-wiki
        Hide
        Bhallamudi Venkata Siva Kamesh added a comment -

        Hi Gelesh,

        If there is any test failure, one can access them through Test results URL.

        -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:org.apache.hadoop.ha.TestZKFailoverController

        The above test failure seems to be unrelated to this patch.

        The patch does not contain any testcase. Please update your a patch with a testcase.

        Show
        Bhallamudi Venkata Siva Kamesh added a comment - Hi Gelesh, If there is any test failure, one can access them through Test results URL. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:org.apache.hadoop.ha.TestZKFailoverController The above test failure seems to be unrelated to this patch. The patch does not contain any testcase. Please update your a patch with a testcase.
        Hide
        Gelesh added a comment -

        Kindly provide the details or URL to access the details , for the failed test case
        org.apache.hadoop.ha.TestZKFailoverController
        Including src code, input supplied , expected output etc.
        Thank you

        Show
        Gelesh added a comment - Kindly provide the details or URL to access the details , for the failed test case org.apache.hadoop.ha.TestZKFailoverController Including src code, input supplied , expected output etc. Thank you
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12539059/MAPREDUCE-4512.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

        org.apache.hadoop.ha.TestZKFailoverController

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1252//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1252//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539059/MAPREDUCE-4512.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1252//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1252//console This message is automatically generated.
        Hide
        Jason Lowe added a comment -

        Moving to project Hadoop Common since that's where the patch needs to be applied.

        In the future, please don't set the Reviewed flag unless the patch has been reviewed and approved by someone in the community. I see no record of that occurring, so I've cleared that flag. Also the Fix versions flag is intended to mark where the patch has been integrated, please don't set this field. If you'd like to indicate what versions you'd like to have the patch committed to, use the Target Versions field.

        Show
        Jason Lowe added a comment - Moving to project Hadoop Common since that's where the patch needs to be applied. In the future, please don't set the Reviewed flag unless the patch has been reviewed and approved by someone in the community. I see no record of that occurring, so I've cleared that flag. Also the Fix versions flag is intended to mark where the patch has been integrated, please don't set this field. If you'd like to indicate what versions you'd like to have the patch committed to, use the Target Versions field.
        Hide
        Bhallamudi Venkata Siva Kamesh added a comment -

        Please update the patch with a Testcase.

        Show
        Bhallamudi Venkata Siva Kamesh added a comment - Please update the patch with a Testcase.
        Hide
        Sonu Prathap added a comment -

        I am also facing the similar issue, Please help me to re create the fixed code using patch

        Show
        Sonu Prathap added a comment - I am also facing the similar issue, Please help me to re create the fixed code using patch
        Hide
        Gelesh added a comment -

        Test case
        input file text
        record 1 name: Java Location:UAErecord 2 name:Gelesh Location:Bangalorrecord 3 name Hadoop Location:Kerala

        Delimiter = "record"

        expected values in map
        1 name: Java Location:UAE
        2 name:Gelesh Location:Bangalor
        3 name Hadoop Location:Kerala

        Actual values received in map
        1 name: Java Location:UAE
        2 name:Gelesh Location:Bangalorrecord 3 name Hadoop Location:Kerala

        Show
        Gelesh added a comment - Test case input file text record 1 name: Java Location:UAErecord 2 name:Gelesh Location:Bangalorrecord 3 name Hadoop Location:Kerala Delimiter = "record" expected values in map 1 name: Java Location:UAE 2 name:Gelesh Location:Bangalor 3 name Hadoop Location:Kerala Actual values received in map 1 name: Java Location:UAE 2 name:Gelesh Location:Bangalorrecord 3 name Hadoop Location:Kerala
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12539059/MAPREDUCE-4512.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2706//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2706//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539059/MAPREDUCE-4512.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2706//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2706//console This message is automatically generated.
        Hide
        Gelesh added a comment -

        Just One line code change at LineRecord. Tested in case there is any issue please mail me gelesh.hadoop@gmail.com

        Show
        Gelesh added a comment - Just One line code change at LineRecord. Tested in case there is any issue please mail me gelesh.hadoop@gmail.com
        Hide
        Gelesh added a comment -

        just one line of code change @ LineReader, would do. Tested
        Any issues please let me know to help further
        gelesh.hadoop@gmail.com

        Show
        Gelesh added a comment - just one line of code change @ LineReader, would do. Tested Any issues please let me know to help further gelesh.hadoop@gmail.com

          People

          • Assignee:
            Unassigned
            Reporter:
            Gelesh
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1m
              1m
              Remaining:
              Remaining Estimate - 1m
              1m
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development