Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4782

NLineInputFormat skips first line of last InputSplit

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk
    • Fix Version/s: 1.1.1, 3.0.0, 2.0.3-alpha, 0.23.5
    • Component/s: client
    • Labels:
      None
    • Environment:

      Description

      NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.

      After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.

      This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem.

      1. MAPREDUCE-4782.patch
        3 kB
        Mark Fuhs
      2. MR-4782.txt
        6 kB
        Robert Joseph Evans
      3. MR-4782-branch-1.txt
        5 kB
        Robert Joseph Evans

        Activity

        Hide
        Robert Joseph Evans added a comment -

        Marked this a critical as data loss is serious. Mark can you post your patch?

        Show
        Robert Joseph Evans added a comment - Marked this a critical as data loss is serious. Mark can you post your patch?
        Hide
        Mark Fuhs added a comment -

        I confess I'm not terribly familiar with git, so this is just a "git diff".

        Show
        Mark Fuhs added a comment - I confess I'm not terribly familiar with git, so this is just a "git diff".
        Hide
        Robert Joseph Evans added a comment -

        I was able to reproduce the issue, and I have updated the test case to reproduce it as well. The original test case did not check the last split, I don't know why. I also found out that this exists in branch-1 as well.

        Show
        Robert Joseph Evans added a comment - I was able to reproduce the issue, and I have updated the test case to reproduce it as well. The original test case did not check the last split, I don't know why. I also found out that this exists in branch-1 as well.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12552709/MR-4782.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

        org.apache.hadoop.mapred.TestClusterMRNotification

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3000//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3000//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552709/MR-4782.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.TestClusterMRNotification +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3000//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3000//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        The patch looks good to me I am +1 on it, but I added in the test, so if someone else could take a look I would appreciate it.

        Show
        Robert Joseph Evans added a comment - The patch looks good to me I am +1 on it, but I added in the test, so if someone else could take a look I would appreciate it.
        Hide
        Matt Foley added a comment -

        Nasty. Could you please port to branch-1 and I'll include it in the next release?

        Show
        Matt Foley added a comment - Nasty. Could you please port to branch-1 and I'll include it in the next release?
        Hide
        Robert Joseph Evans added a comment -

        Patch for branch-1. The patch is identical to the one for trunk except for line numbers and the location of the files.

        Show
        Robert Joseph Evans added a comment - Patch for branch-1. The patch is identical to the one for trunk except for line numbers and the location of the files.
        Hide
        Robert Joseph Evans added a comment -

        Also now that I think about it more this really is a Blocker, not a critical.

        Show
        Robert Joseph Evans added a comment - Also now that I think about it more this really is a Blocker, not a critical.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12552844/MR-4782-branch-1.txt
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3003//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552844/MR-4782-branch-1.txt against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3003//console This message is automatically generated.
        Hide
        Jason Lowe added a comment -

        +1, thanks Mark and Bobby. Bobby or Matt, feel free to commit.

        Show
        Jason Lowe added a comment - +1, thanks Mark and Bobby. Bobby or Matt, feel free to commit.
        Hide
        Robert Joseph Evans added a comment -

        Thanks Mark,

        This is a great catch, I just wish we had found it sooner. I put this into trunk, branch-2, branch-0.23, branch-1, and branch-1.1.

        If I missed any branches that people want it in please let me know and I will see what I can do.

        Show
        Robert Joseph Evans added a comment - Thanks Mark, This is a great catch, I just wish we had found it sooner. I put this into trunk, branch-2, branch-0.23, branch-1, and branch-1.1. If I missed any branches that people want it in please let me know and I will see what I can do.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #2988 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2988/)
        MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #2988 (See https://builds.apache.org/job/Hadoop-trunk-Commit/2988/ ) MAPREDUCE-4782 . NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java
        Hide
        Mark Fuhs added a comment -

        I'm glad I could contribute!

        Show
        Mark Fuhs added a comment - I'm glad I could contribute!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1252 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1252/)
        MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1252 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1252/ ) MAPREDUCE-4782 . NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #32 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/32/)
        MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #32 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/32/ ) MAPREDUCE-4782 . NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #431 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/431/)
        svn merge -c 1407505 FIXES: MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407507)

        Result = UNSTABLE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407507
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #431 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/431/ ) svn merge -c 1407505 FIXES: MAPREDUCE-4782 . NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407507) Result = UNSTABLE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407507 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1222 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1222/)
        MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1222 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1222/ ) MAPREDUCE-4782 . NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1407505 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java
        Hide
        Matt Foley added a comment -

        Closed upon release of 1.1.1.

        Show
        Matt Foley added a comment - Closed upon release of 1.1.1.

          People

          • Assignee:
            Mark Fuhs
            Reporter:
            Mark Fuhs
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development