Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6635

Unsafe long to int conversion in UncompressedSplitLineReader and IndexOutOfBoundsException

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None

      Description

      LineRecordReader creates the unsplittable reader like so:

            in = new UncompressedSplitLineReader(
                fileIn, job, recordDelimiter, split.getLength());
      

      Split length goes to

        private long splitLength;
      

      At some point when reading the first line, fillBuffer does this:

        @Override
        protected int fillBuffer(InputStream in, byte[] buffer, boolean inDelimiter)
            throws IOException {
          int maxBytesToRead = buffer.length;
          if (totalBytesRead < splitLength) {
            maxBytesToRead = Math.min(maxBytesToRead,
                                      (int)(splitLength - totalBytesRead));
      

      which will be a negative number for large splits, and the subsequent dfs read will fail with a boundary check.

      java.lang.IndexOutOfBoundsException
              at java.nio.Buffer.checkBounds(Buffer.java:559)
              at java.nio.ByteBuffer.get(ByteBuffer.java:668)
              at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:279)
              at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:172)
              at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:744)
              at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:800)
              at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:860)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:903)
              at java.io.DataInputStream.read(DataInputStream.java:149)
              at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59)
              at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
              at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
              at org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91)
              at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144)
              at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184)
      

      This has been reported here: https://issues.streamsets.com/browse/SDC-2229, also happens in Hive if very large text files are forced to be read in a single split (e.g. via header-skipping feature, or via set mapred.min.split.size=9999999999999999)

        Activity

        Hide
        djp Junping Du added a comment -

        Thanks Sergey Shelukhin for reporting this issue. Upload a patch to fix it with unit test that can reproduce the same issue.

        Show
        djp Junping Du added a comment - Thanks Sergey Shelukhin for reporting this issue. Upload a patch to fix it with unit test that can reproduce the same issue.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 19m 34s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 7m 14s trunk passed
        +1 compile 0m 22s trunk passed with JDK v1.8.0_72
        +1 compile 0m 24s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 18s trunk passed
        +1 mvnsite 0m 34s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 1m 9s trunk passed
        +1 javadoc 0m 25s trunk passed with JDK v1.8.0_72
        +1 javadoc 0m 26s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 27s the patch passed
        +1 compile 0m 21s the patch passed with JDK v1.8.0_72
        +1 javac 0m 21s the patch passed
        +1 compile 0m 24s the patch passed with JDK v1.7.0_95
        +1 javac 0m 24s the patch passed
        +1 checkstyle 0m 15s the patch passed
        +1 mvnsite 0m 31s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 1m 20s the patch passed
        +1 javadoc 0m 22s the patch passed with JDK v1.8.0_72
        +1 javadoc 0m 25s the patch passed with JDK v1.7.0_95
        -1 unit 2m 5s hadoop-mapreduce-client-core in the patch failed with JDK v1.8.0_72.
        -1 unit 2m 22s hadoop-mapreduce-client-core in the patch failed with JDK v1.7.0_95.
        +1 asflicense 0m 20s Patch does not generate ASF License warnings.
        40m 41s



        Reason Tests
        JDK v1.8.0_72 Failed junit tests hadoop.mapreduce.lib.input.TestFileInputFormat
          hadoop.mapred.TestFileInputFormat
        JDK v1.7.0_95 Failed junit tests hadoop.mapreduce.lib.input.TestFileInputFormat
          hadoop.mapred.TestFileInputFormat



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788265/MAPREDUCE-6635.patch
        JIRA Issue MAPREDUCE-6635
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 0bd09c67eadb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / fd1befb
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core-jdk1.8.0_72.txt
        unit https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core-jdk1.7.0_95.txt
        unit test logs https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core-jdk1.7.0_95.txt
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/testReport/
        modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
        Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/console
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 19m 34s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 14s trunk passed +1 compile 0m 22s trunk passed with JDK v1.8.0_72 +1 compile 0m 24s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 18s trunk passed +1 mvnsite 0m 34s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 9s trunk passed +1 javadoc 0m 25s trunk passed with JDK v1.8.0_72 +1 javadoc 0m 26s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 27s the patch passed +1 compile 0m 21s the patch passed with JDK v1.8.0_72 +1 javac 0m 21s the patch passed +1 compile 0m 24s the patch passed with JDK v1.7.0_95 +1 javac 0m 24s the patch passed +1 checkstyle 0m 15s the patch passed +1 mvnsite 0m 31s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 20s the patch passed +1 javadoc 0m 22s the patch passed with JDK v1.8.0_72 +1 javadoc 0m 25s the patch passed with JDK v1.7.0_95 -1 unit 2m 5s hadoop-mapreduce-client-core in the patch failed with JDK v1.8.0_72. -1 unit 2m 22s hadoop-mapreduce-client-core in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 40m 41s Reason Tests JDK v1.8.0_72 Failed junit tests hadoop.mapreduce.lib.input.TestFileInputFormat   hadoop.mapred.TestFileInputFormat JDK v1.7.0_95 Failed junit tests hadoop.mapreduce.lib.input.TestFileInputFormat   hadoop.mapred.TestFileInputFormat Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788265/MAPREDUCE-6635.patch JIRA Issue MAPREDUCE-6635 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0bd09c67eadb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / fd1befb Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core-jdk1.8.0_72.txt unit https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6328/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        sershe Sergey Shelukhin added a comment -

        +1 (non-binding)

        Show
        sershe Sergey Shelukhin added a comment - +1 (non-binding)
        Hide
        vvasudev Varun Vasudev added a comment -

        +1. I'll commit this tomorrow if no one objects.

        Show
        vvasudev Varun Vasudev added a comment - +1. I'll commit this tomorrow if no one objects.
        Hide
        vvasudev Varun Vasudev added a comment -

        Committed to trunk, branch-2, branch-2.8, branch-2.7 and branch-2.6. Thanks Junping Du!

        Show
        vvasudev Varun Vasudev added a comment - Committed to trunk, branch-2, branch-2.8, branch-2.7 and branch-2.6. Thanks Junping Du !
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9346 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9346/)
        MAPREDUCE-6635. Unsafe long to int conversion in (vvasudev: rev c6f2d761d5430eac6b9f07f137a7028de4e0660c)

        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9346 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9346/ ) MAPREDUCE-6635 . Unsafe long to int conversion in (vvasudev: rev c6f2d761d5430eac6b9f07f137a7028de4e0660c) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java
        Hide
        djp Junping Du added a comment -

        Thanks Varun Vasudev for review and commit!

        Show
        djp Junping Du added a comment - Thanks Varun Vasudev for review and commit!
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Closing the JIRA as part of 2.7.3 release.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.

          People

          • Assignee:
            djp Junping Du
            Reporter:
            sershe Sergey Shelukhin
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development