Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5028

Maps fail when io.sort.mb is set to high value

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.1.1, 2.0.3-alpha, 0.23.5
    • Fix Version/s: 1.2.0, 2.4.0
    • Component/s: None
    • Labels:
      None

      Description

      Verified the problem exists on branch-1 with the following configuration:

      Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, io.sort.mb=1280, dfs.block.size=2147483648

      Run teragen to generate 4 GB data
      Maps fail when you run wordcount on this configuration with the following error:

      java.io.IOException: Spill failed
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
      	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
      	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      	at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
      	at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
      	at org.apache.hadoop.mapred.Child.main(Child.java:249)
      Caused by: java.io.EOFException
      	at java.io.DataInputStream.readInt(DataInputStream.java:375)
      	at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
      	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
      	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
      	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
      	at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
      
      1. mr-5028-3.patch
        20 kB
        Karthik Kambatla
      2. mr-5028-2.patch
        16 kB
        Karthik Kambatla
      3. mr-5028-1.patch
        16 kB
        Karthik Kambatla
      4. MR-5028_testapp.patch
        11 kB
        Arun C Murthy
      5. repro-mr-5028.patch
        9 kB
        Karthik Kambatla
      6. mr-5028-trunk.patch
        5 kB
        Karthik Kambatla
      7. mr-5028-trunk.patch
        4 kB
        Karthik Kambatla
      8. mr-5028-trunk.patch
        4 kB
        Karthik Kambatla
      9. mr-5028-branch1.patch
        2 kB
        Karthik Kambatla
      10. mr-5028-branch1.patch
        3 kB
        Karthik Kambatla
      11. mr-5028-branch1.patch
        3 kB
        Karthik Kambatla

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/)
          MAPREDUCE-5028. Fixed a bug in MapTask that was causing mappers to fail when a large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576170)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/ ) MAPREDUCE-5028 . Fixed a bug in MapTask that was causing mappers to fail when a large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576170 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1698 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1698/)
          MAPREDUCE-5028. Fixed a bug in MapTask that was causing mappers to fail when a large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576170)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1698 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1698/ ) MAPREDUCE-5028 . Fixed a bug in MapTask that was causing mappers to fail when a large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576170 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/506/)
          MAPREDUCE-5028. Fixed a bug in MapTask that was causing mappers to fail when a large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576170)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/506/ ) MAPREDUCE-5028 . Fixed a bug in MapTask that was causing mappers to fail when a large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576170 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5302 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5302/)
          MAPREDUCE-5028. Fixed a bug in MapTask that was causing mappers to fail when a large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576170)

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5302 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5302/ ) MAPREDUCE-5028 . Fixed a bug in MapTask that was causing mappers to fail when a large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1576170 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Committed this to trunk, branch-2 and branch-2.4. Thanks Karthik!

          Jason Lowe/Thomas Graves et al, please pull this into 0.23 if needed.

          Show
          Vinod Kumar Vavilapalli added a comment - Committed this to trunk, branch-2 and branch-2.4. Thanks Karthik! Jason Lowe / Thomas Graves et al, please pull this into 0.23 if needed.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12633783/mr-5028-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4409//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4409//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633783/mr-5028-3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4409//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4409//console This message is automatically generated.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The latest patch looks good to me. +1. The test is failing without the patch and passes with.

          Jenkins is having issues. Trying again manually. Will commit this once Jenkins says okay..

          Show
          Vinod Kumar Vavilapalli added a comment - The latest patch looks good to me. +1. The test is failing without the patch and passes with. Jenkins is having issues. Trying again manually. Will commit this once Jenkins says okay..
          Hide
          Karthik Kambatla added a comment -

          Updated patch adds a unit tests and incorporates most of Vinod's suggestions.

          There seems to be one more related bug w.r.t usage of DatInputBuffer.reset().

          One of my earlier patches (the one that got reverted last year) was needlessly fixing this related "bug".

          Can you also cross verify Task.ValuesIterator.readNextKey() and readNextValue()?

          Looks like you are right, but we need to be very careful and run several workloads before including them though (got bit by this before). If you are okay with it, I would like to address them in MAPREDUCE-5032, so we can get this piece in first as it has been tested by multiple people.

          Show
          Karthik Kambatla added a comment - Updated patch adds a unit tests and incorporates most of Vinod's suggestions. There seems to be one more related bug w.r.t usage of DatInputBuffer.reset(). One of my earlier patches (the one that got reverted last year) was needlessly fixing this related "bug". Can you also cross verify Task.ValuesIterator.readNextKey() and readNextValue()? Looks like you are right, but we need to be very careful and run several workloads before including them though (got bit by this before). If you are okay with it, I would like to address them in MAPREDUCE-5032 , so we can get this piece in first as it has been tested by multiple people.
          Hide
          Karthik Kambatla added a comment -

          Why would they take 10 mins?

          Running LargeSorter for io.sort.mb = 1536 takes > 4 minutes on a relatively beefy machine. Running two more jobs, I am assuming would bring the total to around 10 mins. On not so beefy machines, it could take longer.

          If you insist on adding a unit test, I can try putting together one.

          Show
          Karthik Kambatla added a comment - Why would they take 10 mins? Running LargeSorter for io.sort.mb = 1536 takes > 4 minutes on a relatively beefy machine. Running two more jobs, I am assuming would bring the total to around 10 mins. On not so beefy machines, it could take longer. If you insist on adding a unit test, I can try putting together one.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          We have a lot of MR jobs running as part of our test-suite. Aren't these similar to those (clearly I didn't run the large-sorter myself)? Why would they take 10 mins?

          Show
          Vinod Kumar Vavilapalli added a comment - We have a lot of MR jobs running as part of our test-suite. Aren't these similar to those (clearly I didn't run the large-sorter myself)? Why would they take 10 mins?
          Hide
          Karthik Kambatla added a comment -

          Testing for 3 different values of io.sort.mb - different sizes - will probably take 10 minutes altogether. I agree the unit tests help with making sure we don't introduce regressions, but adding 10 minutes to each run of precommit build etc. seems like a huge overhead. What do you think of adding a module for integration/system/end-to-end tests that get run nightly, but not for precommit-builds etc.? That seems like a better approach in the long run - we can include stress tests for YARN state-store, HA etc. as well. If you think that would be more reasonable, I propose we do it in follow-up tasks.

          I can add unit tests along the lines of repro-mr-5028.patch and address other concerns with the patch.

          Show
          Karthik Kambatla added a comment - Testing for 3 different values of io.sort.mb - different sizes - will probably take 10 minutes altogether. I agree the unit tests help with making sure we don't introduce regressions, but adding 10 minutes to each run of precommit build etc. seems like a huge overhead. What do you think of adding a module for integration/system/end-to-end tests that get run nightly, but not for precommit-builds etc.? That seems like a better approach in the long run - we can include stress tests for YARN state-store, HA etc. as well. If you think that would be more reasonable, I propose we do it in follow-up tasks. I can add unit tests along the lines of repro-mr-5028.patch and address other concerns with the patch.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Looked at the patch. Tx for working on it, this one is a year in the making. Let's do the last push!

          So the main bug fix is the int overflow stuff in MapTask.java, right?

          Not caused by your patch, but minor suggestions for InMemoryReader.java

          • start and length can and should be marked final.
          • memDataIn can be private and final

          Can we fix the javadoc of the getLengh() API? Ideally I'd also rename the API, but will leave it up to you. I don't have a problem reviewing a patch with that rename. Anyways, back to javadoc for DataInputBuffer.getLength() and DataInputBuffer.Buffer.getLength(), we can say what Chris Douglas said above - (from ByteArrayInputStream) "returns the index one greater than the last valid character in the input stream buffer."

          There seems to be one more related bug w.r.t usage of DatInputBuffer.reset(). Can you also cross verify Task.ValuesIterator.readNextKey() and readNextValue()?

          Why not run this LargeSorter with large values for io.sort.mb as a unit test? I'm afraid that we may unknowingly add more such issues in future if we don't automatically run it as part of our test-suite. The problem will be playing with surefire's heap sizes, but that's about it.

          We should hop onto MAPREDUCE-5032 next as I am not sure if we are fixing everything here.

          Show
          Vinod Kumar Vavilapalli added a comment - Looked at the patch. Tx for working on it, this one is a year in the making. Let's do the last push! So the main bug fix is the int overflow stuff in MapTask.java, right? Not caused by your patch, but minor suggestions for InMemoryReader.java start and length can and should be marked final. memDataIn can be private and final Can we fix the javadoc of the getLengh() API? Ideally I'd also rename the API, but will leave it up to you. I don't have a problem reviewing a patch with that rename. Anyways, back to javadoc for DataInputBuffer.getLength() and DataInputBuffer.Buffer.getLength(), we can say what Chris Douglas said above - (from ByteArrayInputStream) "returns the index one greater than the last valid character in the input stream buffer." There seems to be one more related bug w.r.t usage of DatInputBuffer.reset(). Can you also cross verify Task.ValuesIterator.readNextKey() and readNextValue()? Why not run this LargeSorter with large values for io.sort.mb as a unit test? I'm afraid that we may unknowingly add more such issues in future if we don't automatically run it as part of our test-suite. The problem will be playing with surefire's heap sizes, but that's about it. We should hop onto MAPREDUCE-5032 next as I am not sure if we are fixing everything here.
          Hide
          Karthik Kambatla added a comment -

          It has been fixed in branch-1, since 1.2.0.

          Show
          Karthik Kambatla added a comment - It has been fixed in branch-1, since 1.2.0.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Looking at it.

          Not sure of any more 1.x releases. IAC, dropping fix-version and using target-version only.

          Show
          Vinod Kumar Vavilapalli added a comment - Looking at it. Not sure of any more 1.x releases. IAC, dropping fix-version and using target-version only.
          Hide
          Karthik Kambatla added a comment -

          Vinod Kumar Vavilapalli offered to look at this patch. Thanks Vinod.

          Show
          Karthik Kambatla added a comment - Vinod Kumar Vavilapalli offered to look at this patch. Thanks Vinod.
          Hide
          Karthik Kambatla added a comment -

          javac warning fix.

          Show
          Karthik Kambatla added a comment - javac warning fix.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12633012/mr-5028-1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          -1 javac. The applied patch generated 1532 javac compiler warnings (more than the trunk's current 1531 warnings).

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4395//testReport/
          Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4395//artifact/trunk/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4395//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633012/mr-5028-1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. -1 javac . The applied patch generated 1532 javac compiler warnings (more than the trunk's current 1531 warnings). +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4395//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4395//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4395//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Here is an updated patch (mr-5028-1.patch):

          • The fixes stay the same. Thanks to everyone who have tried it out and verified it works.
          • LargeSorter example, which builds on Arun's _testapp.patch
            • Mapper generates a bunch of data, Reducer does nothing.
            • LargeSorter allows setting bytes-per-map, num_maps, and num_reduces
            • By default, bytes-per-map = twice the io.sort.mb or 1 GB if that is also not specified. num-maps = 2, num-reduces = 1.
            • Job sets mapreduce.map.memory.mb to max(twice io.sort.mb, 1 GB). java-opts are set to 200 MB lesser.

          Adding unit tests that run this LargeSorter for different values of io.sort.mb and bytes-per-map seems like an overkill to me. On a pseudo-dist cluster, running
          LargeSorter with mapreduce.task.io.sort.mb set to 1536 takes more than 4 minutes.

          To verify this patch, I ran this LargeSorter job for different values without the fix. For io.sort.mb 1536, I run into integer overflow issues. The job succeeds with the patch applied. I believe this is sufficient validation for the fix.

          Chris Douglas, Arun C Murthy, Alejandro Abdelnur - can any of you take a look at the latest patch.

          Show
          Karthik Kambatla added a comment - Here is an updated patch (mr-5028-1.patch): The fixes stay the same. Thanks to everyone who have tried it out and verified it works. LargeSorter example, which builds on Arun's _testapp.patch Mapper generates a bunch of data, Reducer does nothing. LargeSorter allows setting bytes-per-map, num_maps, and num_reduces By default, bytes-per-map = twice the io.sort.mb or 1 GB if that is also not specified. num-maps = 2, num-reduces = 1. Job sets mapreduce.map.memory.mb to max(twice io.sort.mb, 1 GB). java-opts are set to 200 MB lesser. Adding unit tests that run this LargeSorter for different values of io.sort.mb and bytes-per-map seems like an overkill to me. On a pseudo-dist cluster, running LargeSorter with mapreduce.task.io.sort.mb set to 1536 takes more than 4 minutes. To verify this patch, I ran this LargeSorter job for different values without the fix. For io.sort.mb 1536, I run into integer overflow issues. The job succeeds with the patch applied. I believe this is sufficient validation for the fix. Chris Douglas , Arun C Murthy , Alejandro Abdelnur - can any of you take a look at the latest patch.
          Hide
          Carlo Curino added a comment -

          I am swamped till the end of the month for sure, so you probably will get to it before me... If it is still open when at that point I can help you by running this at scale and confirm whether your fix is working or not.

          Show
          Carlo Curino added a comment - I am swamped till the end of the month for sure, so you probably will get to it before me... If it is still open when at that point I can help you by running this at scale and confirm whether your fix is working or not.
          Hide
          Karthik Kambatla added a comment -

          Hey Carlo - sorry I have been distracted with the HA work.

          Per Arun's suggestion, I was trying to write a unit test that doesn't take too long but tests the fix. I spent half-a-day, but couldn't come up with anything tangible. I might be able to spend more time in a couple of weeks. Please feel free to pick this up - a combination of MR-5028_testapp.patch (Arun's patch) and repro-mr-5028.patch could be good starting points for tests on this.

          I deployed the fix on a cluster internally, ran it for a while, and haven't noticed any issues with the latest patch. Gopal seems to have validated through extensive logging and debugging as well.

          Show
          Karthik Kambatla added a comment - Hey Carlo - sorry I have been distracted with the HA work. Per Arun's suggestion, I was trying to write a unit test that doesn't take too long but tests the fix. I spent half-a-day, but couldn't come up with anything tangible. I might be able to spend more time in a couple of weeks. Please feel free to pick this up - a combination of MR-5028_testapp.patch (Arun's patch) and repro-mr-5028.patch could be good starting points for tests on this. I deployed the fix on a cluster internally, ran it for a while, and haven't noticed any issues with the latest patch. Gopal seems to have validated through extensive logging and debugging as well.
          Hide
          Carlo Curino added a comment -

          Hey Karthik, any news on this?

          I think we might have stumbled upon the same issue while running gridmix with some large parameters for io.sort.mb.
          This was with a very recent version of trunk (errors showing up in LoadJob during cleanup() an regular map() calls).

          Show
          Carlo Curino added a comment - Hey Karthik, any news on this? I think we might have stumbled upon the same issue while running gridmix with some large parameters for io.sort.mb. This was with a very recent version of trunk (errors showing up in LoadJob during cleanup() an regular map() calls).
          Hide
          Karthik Kambatla added a comment -

          Thanks Gopal for verifying the fix. I did write some tests using Arun's patch, but have been running into issues where the test crashes and doesn't report failure. Taking a closer look, will provide an update as soon as I have one.

          Show
          Karthik Kambatla added a comment - Thanks Gopal for verifying the fix. I did write some tests using Arun's patch, but have been running into issues where the test crashes and doesn't report failure. Taking a closer look, will provide an update as soon as I have one.
          Hide
          Gopal V added a comment -

          I've reviewed this and the latest patch (please number the uploads for easy reference) works correctly & prevents the over-flow.

          +1 on the patch.

          Show
          Gopal V added a comment - I've reviewed this and the latest patch (please number the uploads for easy reference) works correctly & prevents the over-flow. +1 on the patch.
          Hide
          Gopal V added a comment -

          I ran the tests again because something didn't seem right - my '+' operation was turning into a string concat operation in logging (ugh).

          2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: input.length = 1342177280, start = 687161440, length = 687161444
          2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: count math 687161440 + 687161444 = 1374322884
          2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.io.DataInputBuffer$Buffer.reset(DataInputBuffer.java:58)
          2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.io.DataInputBuffer.reset(DataInputBuffer.java:92)
          2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:144)
          2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.mapreduce.task.ReduceContextImpl$ValueIterator.next(ReduceContextImpl.java:237)
          ....
          2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: input.length = 1342177280, start = 905211353, length = 905211357
          2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: count math 905211353 + 905211357 = 1810422710
          2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.io.DataInputBuffer$Buffer.reset(DataInputBuffer.java:58)
          2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.io.DataInputBuffer.reset(DataInputBuffer.java:92)
          2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:144)
          2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
          

          Those are wrong, definitely wrong.

          Show
          Gopal V added a comment - I ran the tests again because something didn't seem right - my '+' operation was turning into a string concat operation in logging ( ugh ). 2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: input.length = 1342177280, start = 687161440, length = 687161444 2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: count math 687161440 + 687161444 = 1374322884 2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.io.DataInputBuffer$Buffer.reset(DataInputBuffer.java:58) 2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.io.DataInputBuffer.reset(DataInputBuffer.java:92) 2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:144) 2013-05-15 18:52:47,876 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.mapreduce.task.ReduceContextImpl$ValueIterator.next(ReduceContextImpl.java:237) .... 2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: input.length = 1342177280, start = 905211353, length = 905211357 2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: count math 905211353 + 905211357 = 1810422710 2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.io.DataInputBuffer$Buffer.reset(DataInputBuffer.java:58) 2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.io.DataInputBuffer.reset(DataInputBuffer.java:92) 2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:144) 2013-05-15 18:52:47,861 INFO [SpillThread] org.apache.hadoop.io.DataInputBuffer: org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) Those are wrong, definitely wrong.
          Hide
          Gopal V added a comment -

          Karthik Kambatla, I took some time to go through your patch.

          Patch contains 2 different fixes, which deserve their own tests & commits.

          Good catch on the overflow with the kvindex,kvend variables. That is a bug with the mapper with large buffers. That is a good & clean fix.

          But for the second issue, I found out it triggered when the inline Combiner is run when there are > 3 spills in the SpillThread. This wasn't tested in Arun C Murthy's test-app (but the word-count sum combiner does trigger it cleanly).

          And there I found your fix to be suspect. So, for the sake of data I logged every call to reset and crawled a 13 gb log file to find out offenders in reset (i.e where (long)start + (long)length > input.length).

          This particular back-trace stood out as a key offender. I found that to be significant instead of merely locating the overflow cases.

          org.apache.hadoop.mapred.MapTask$MapOutputBuffer$MRResultIterator.getKey(MapTask.java:1784)
          org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:138)
          

          I will take a closer look at that code, it might be cleaner to tackle the issue at the first-cause location.

          Show
          Gopal V added a comment - Karthik Kambatla , I took some time to go through your patch. Patch contains 2 different fixes, which deserve their own tests & commits. Good catch on the overflow with the kvindex,kvend variables. That is a bug with the mapper with large buffers. That is a good & clean fix. But for the second issue, I found out it triggered when the inline Combiner is run when there are > 3 spills in the SpillThread. This wasn't tested in Arun C Murthy 's test-app (but the word-count sum combiner does trigger it cleanly). And there I found your fix to be suspect. So, for the sake of data I logged every call to reset and crawled a 13 gb log file to find out offenders in reset (i.e where (long)start + (long)length > input.length). This particular back-trace stood out as a key offender. I found that to be significant instead of merely locating the overflow cases. org.apache.hadoop.mapred.MapTask$MapOutputBuffer$MRResultIterator.getKey(MapTask.java:1784) org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:138) I will take a closer look at that code, it might be cleaner to tackle the issue at the first-cause location.
          Hide
          Karthik Kambatla added a comment -

          Realized the default num_reduces can be zero, was under the impression we wanted to test shuffle as well with this. Now, makes sense. Thanks.

          Show
          Karthik Kambatla added a comment - Realized the default num_reduces can be zero, was under the impression we wanted to test shuffle as well with this. Now, makes sense. Thanks.
          Hide
          Karthik Kambatla added a comment -

          Thanks Arun. Great idea to have these tests and the test-app patch looks mostly good - the default num_reduces should probably be 1. Let me try adding some tests using this.

          Show
          Karthik Kambatla added a comment - Thanks Arun. Great idea to have these tests and the test-app patch looks mostly good - the default num_reduces should probably be 1. Let me try adding some tests using this.
          Hide
          Gopal V added a comment -

          HADOOP_CLASSPATH=./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/hadoop-mapreduce-client-jobclient-2.0.5-SNAPSHOT-tests.jar hadoop org.apache.hadoop.mapreduce.LargeSorter -D io.sort.mb=1280 -D mapreduce.map.memory.mb=3072 -D mapred.map.child.java.opts="-server -Xmx2248m -Djava.net.preferIPv4Stack=true" /tmp/q.$RANDOM -r 2

          ran with extra logging

          2013-05-14 20:52:52,911 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
          2013-05-14 20:52:56,794 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 0
          2013-05-14 20:52:56,795 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart, kvend = int(-134521988) or long(268131196)

          Show
          Gopal V added a comment - HADOOP_CLASSPATH=./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/hadoop-mapreduce-client-jobclient-2.0.5-SNAPSHOT-tests.jar hadoop org.apache.hadoop.mapreduce.LargeSorter -D io.sort.mb=1280 -D mapreduce.map.memory.mb=3072 -D mapred.map.child.java.opts="-server -Xmx2248m -Djava.net.preferIPv4Stack=true" /tmp/q.$RANDOM -r 2 ran with extra logging 2013-05-14 20:52:52,911 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2013-05-14 20:52:56,794 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 0 2013-05-14 20:52:56,795 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart, kvend = int(-134521988) or long(268131196)
          Hide
          Gopal V added a comment -

          The integer overflow in kvindex/kvstart/kvend seems to be a pretty direct problem for large buffers (i.e + buffer.length overflows).

          Taking a closer look at the .reset() changes separately, tomorrow.

          Show
          Gopal V added a comment - The integer overflow in kvindex/kvstart/kvend seems to be a pretty direct problem for large buffers (i.e + buffer.length overflows). Taking a closer look at the .reset() changes separately, tomorrow.
          Hide
          Arun C Murthy added a comment -

          I think a set of tests which verify io.sort.mb of 128M, 256M, 1536M (some normal and not-so-normal values) should be very useful.

          I've attached a test-application we can use to drive a set of tests.

          Again, apologies for losing track - I'll try to help as best as I can to speed things up. Thanks!

          Show
          Arun C Murthy added a comment - I think a set of tests which verify io.sort.mb of 128M, 256M, 1536M (some normal and not-so-normal values) should be very useful. I've attached a test-application we can use to drive a set of tests. Again, apologies for losing track - I'll try to help as best as I can to speed things up. Thanks!
          Hide
          Arun C Murthy added a comment -

          Karthik Kambatla Sorry! I've been lax on this one. I'm on a flight now, which should help - looking right now.

          Show
          Arun C Murthy added a comment - Karthik Kambatla Sorry! I've been lax on this one. I'm on a flight now, which should help - looking right now.
          Hide
          Karthik Kambatla added a comment -

          Hey Arun C Murthy: just checking if you had a chance to take a look at this.

          Show
          Karthik Kambatla added a comment - Hey Arun C Murthy : just checking if you had a chance to take a look at this.
          Hide
          Karthik Kambatla added a comment -

          The test failures are a result of the new tests added in repro-mr-5028.patch specifically to reproduce the issue.

          Show
          Karthik Kambatla added a comment - The test failures are a result of the new tests added in repro-mr-5028.patch specifically to reproduce the issue.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12577110/repro-mr-5028.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

          org.apache.hadoop.mapred.TestMapOutputBuffer
          org.apache.hadoop.mapreduce.task.reduce.TestInMemoryReader

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3520//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3520//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577110/repro-mr-5028.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: org.apache.hadoop.mapred.TestMapOutputBuffer org.apache.hadoop.mapreduce.task.reduce.TestInMemoryReader +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3520//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3520//console This message is automatically generated.
          Hide
          Arun C Murthy added a comment -

          Sorry, been busy - thanks for the update. Looking now. Tx.

          Show
          Arun C Murthy added a comment - Sorry, been busy - thanks for the update. Looking now. Tx.
          Hide
          Karthik Kambatla added a comment -

          Hey Arun C Murthy, did you get a chance to take a look at the patch. Please let me know if I can provide any more information.

          Show
          Karthik Kambatla added a comment - Hey Arun C Murthy , did you get a chance to take a look at the patch. Please let me know if I can provide any more information.
          Hide
          Karthik Kambatla added a comment -

          Arun, uploaded a patch (repro-mr-5028.patch) with unit tests that reproduce the problem. Unfortunately, the methods causing the interger overflow do not have integer arithmetic stubbed out and so the tests actually end up creating large buffers.

          I don't think we should include them, but I am sure they will come handy to make sure the posted patch actually fixes the issue. The tests cover 3 of the 4 changes - the 4th is embedded too deep inside to write a unit test for.

          If we want to include the tests, we might have to stub out the array index calculations to other methods which doesn't look very useful either.

          Show
          Karthik Kambatla added a comment - Arun, uploaded a patch (repro-mr-5028.patch) with unit tests that reproduce the problem. Unfortunately, the methods causing the interger overflow do not have integer arithmetic stubbed out and so the tests actually end up creating large buffers. I don't think we should include them, but I am sure they will come handy to make sure the posted patch actually fixes the issue. The tests cover 3 of the 4 changes - the 4th is embedded too deep inside to write a unit test for. If we want to include the tests, we might have to stub out the array index calculations to other methods which doesn't look very useful either.
          Hide
          Arun C Murthy added a comment -

          Karthik, I'm looking, sorry - got sick this week.

          Meanwhile, can you please add a unit test for this? Tx.

          Show
          Arun C Murthy added a comment - Karthik, I'm looking, sorry - got sick this week. Meanwhile, can you please add a unit test for this? Tx.
          Hide
          Karthik Kambatla added a comment -

          Hi Arun C Murthy, did you get a chance to take a look at the patch? Let me know if you have any tests/suggestions in mind.

          Show
          Karthik Kambatla added a comment - Hi Arun C Murthy , did you get a chance to take a look at the patch? Let me know if you have any tests/suggestions in mind.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1378 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1378/)
          Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458433
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1378 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1378/ ) Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458433 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1350 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1350/)
          Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433)

          Result = FAILURE
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458433
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1350 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1350/ ) Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458433 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #559 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/559/)
          svn merge -r 1458380:1458376 Reverted MAPREDUCE-5028 from branch-0.23 (Revision 1458469)
          MAPREDUCE-5028. Maps fail when io.sort.mb is set to high value. (kkambatl via tgraves) (Revision 1458380)

          Result = SUCCESS
          bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458469
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java

          tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458380
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #559 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/559/ ) svn merge -r 1458380:1458376 Reverted MAPREDUCE-5028 from branch-0.23 (Revision 1458469) MAPREDUCE-5028 . Maps fail when io.sort.mb is set to high value. (kkambatl via tgraves) (Revision 1458380) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458469 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458380 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #161 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/161/)
          Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458433
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #161 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/161/ ) Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458433 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Hide
          Arun C Murthy added a comment -

          I'm currently away for Hadoop Summit, can I please get a few more days to review this? Tx.

          Show
          Arun C Murthy added a comment - I'm currently away for Hadoop Summit, can I please get a few more days to review this? Tx.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12574490/mr-5028-trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3440//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3440//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574490/mr-5028-trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3440//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3440//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Uploaded a patch with one additional change. More details below:

          1. The trunk patch from March 4 worked for large values of io.sort.mb, but has caused a bunch of test failures.
          2. The trunk patch from March 19 fixes the test failures by removing an adjustment to a call to DataInputBuffer#reset() that wasn't supposed to be.
          3. However, the above mentioned adjustment was partly responsible for the original patch working for large values of io.sort.mb. Removing that has fixed test failures, but the jobs with large values of io.sort.mb started failing.
          4. The fix seems to be an adjustment to DataInputBuffer#reset() in ReduceContextImpl. A complementary change to ReduceContext exists in branch-1 patch. While working on the original patch, I did make this adjustment but later removed it on noticing negative offsets in jobs (the same behavior as in the test failures). The mess-up seems to be in my picking the wrong adjustment to undo. The latest patch (19 Mar 22:26) fixes this mistake.

          The following steps validate the patch:

          1. Ran all tests under hadoop-mapreduce-project
          2. Ran (pi 4 1000), (teragen 4GB data - 4 mappers), (wordcount on teragen output) on a pseudo-dist cluster with dfs.block.size=64MB, java.opts=1GB, io.sort.mb=256MB
          3. Ran (pi 4 1000), (teragen 4GB data - 2 mappers), (wordcount on teragen output) on a pseudo-dist cluster with dfs.block.size=2GB, java.opts=2.5GB, io.sort.mb=1280MB

          Please suggest any additional testing that you think is required.

          Thanks again Chris for immediately notifying us of the issue. Sorry Alejandro and Bobby for the additional trouble.

          Show
          Karthik Kambatla added a comment - Uploaded a patch with one additional change. More details below: The trunk patch from March 4 worked for large values of io.sort.mb , but has caused a bunch of test failures. The trunk patch from March 19 fixes the test failures by removing an adjustment to a call to DataInputBuffer#reset() that wasn't supposed to be. However, the above mentioned adjustment was partly responsible for the original patch working for large values of io.sort.mb . Removing that has fixed test failures, but the jobs with large values of io.sort.mb started failing. The fix seems to be an adjustment to DataInputBuffer#reset() in ReduceContextImpl . A complementary change to ReduceContext exists in branch-1 patch. While working on the original patch, I did make this adjustment but later removed it on noticing negative offsets in jobs (the same behavior as in the test failures). The mess-up seems to be in my picking the wrong adjustment to undo. The latest patch (19 Mar 22:26) fixes this mistake. The following steps validate the patch: Ran all tests under hadoop-mapreduce-project Ran (pi 4 1000), (teragen 4GB data - 4 mappers), (wordcount on teragen output) on a pseudo-dist cluster with dfs.block.size=64MB, java.opts=1GB, io.sort.mb=256MB Ran (pi 4 1000), (teragen 4GB data - 2 mappers), (wordcount on teragen output) on a pseudo-dist cluster with dfs.block.size=2GB, java.opts=2.5GB, io.sort.mb=1280MB Please suggest any additional testing that you think is required. Thanks again Chris for immediately notifying us of the issue. Sorry Alejandro and Bobby for the additional trouble.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12574399/mr-5028-trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3434//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3434//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574399/mr-5028-trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3434//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3434//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          The issue with the previous patch was in the super.reset() in InMemValBytes#reset(). The adjustment was not required there.

          Here is an updated patch which removes that change.

          I ran all the tests under hadoop-mapreduce-project locally and verified they passed.

          • mvn test: 7 tests failed when ran all together
          • ran each of the 7 individually, and they all passed.

          Yet to run a few sample jobs to verify the patch doesn't introduce regressions. Firing up Jenkins only to see what test-patch has to say.

          Show
          Karthik Kambatla added a comment - The issue with the previous patch was in the super.reset() in InMemValBytes#reset() . The adjustment was not required there. Here is an updated patch which removes that change. I ran all the tests under hadoop-mapreduce-project locally and verified they passed. mvn test: 7 tests failed when ran all together ran each of the 7 individually, and they all passed. Yet to run a few sample jobs to verify the patch doesn't introduce regressions. Firing up Jenkins only to see what test-patch has to say.
          Hide
          Robert Joseph Evans added a comment -

          I reverted the changes from branch-0.23 too. Reopening so we can take a look at how to fix the patch.

          Show
          Robert Joseph Evans added a comment - I reverted the changes from branch-0.23 too. Reopening so we can take a look at how to fix the patch.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3491 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3491/)
          Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458433
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3491 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3491/ ) Reverting MAPREDUCE-5028 (commit 1457918) (Revision 1458433) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1458433 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Hide
          Alejandro Abdelnur added a comment -

          Reverted commits from trunk and branch-2.

          We need to fix test-patch, not the first time this is happening, as it is trying to be smarter than it is by running tests only in modules that changed, the correct thing should be to run tests in all dependent modules. Though a change in commmon would trigger ALL tests. May be a compromise would be to run all the tests in the corresponding component (common, hdfs, yarn, mapreduce, tools).

          Show
          Alejandro Abdelnur added a comment - Reverted commits from trunk and branch-2. We need to fix test-patch, not the first time this is happening, as it is trying to be smarter than it is by running tests only in modules that changed, the correct thing should be to run tests in all dependent modules. Though a change in commmon would trigger ALL tests. May be a compromise would be to run all the tests in the corresponding component (common, hdfs, yarn, mapreduce, tools).
          Hide
          Karthik Kambatla added a comment -

          I believe I found the issue, TestMRJobs passes locally for me now. However, I am running all MR tests locally for sanity before uploading the patch.

          Show
          Karthik Kambatla added a comment - I believe I found the issue, TestMRJobs passes locally for me now. However, I am running all MR tests locally for sanity before uploading the patch.
          Hide
          Karthik Kambatla added a comment -

          Thanks Chris for reporting this. I now see the errors you mention, in my second run. Sorry for the failures, working on identifying the root cause and a fix. Looks like test-patch didn't catch these early because the tests are in client-jobclient and the changes were in client-core.

          Show
          Karthik Kambatla added a comment - Thanks Chris for reporting this. I now see the errors you mention, in my second run. Sorry for the failures, working on identifying the root cause and a fix. Looks like test-patch didn't catch these early because the tests are in client-jobclient and the changes were in client-core.
          Hide
          Chris Nauroth added a comment -

          Thanks, Karthik. I think you'll need to look into the syslog files output from the map tasks in the target/<test name> sub-directory instead of the test run's stdout. The exception I mentioned was in the syslog files when I ran it.

          I'm guessing that we have a subtle error in the offset calculations when InMemoryReader resets buffers, but I haven't pinned it down specifically yet.

          Show
          Chris Nauroth added a comment - Thanks, Karthik. I think you'll need to look into the syslog files output from the map tasks in the target/<test name> sub-directory instead of the test run's stdout. The exception I mentioned was in the syslog files when I ran it. I'm guessing that we have a subtle error in the offset calculations when InMemoryReader resets buffers, but I haven't pinned it down specifically yet.
          Hide
          Karthik Kambatla added a comment -

          Looking into the failures. Sorry if I broke the build.

          I ran TestMRJobs locally - it failed, but shows PrivelegedExceptions (attaching the test outptut).

          Let me run sample jobs on a pseudo-dist setup and see what the deal is. Before submitting the patch, I did run jobs, but only on large values of io.sort.mb. Will report back soon.

          Show
          Karthik Kambatla added a comment - Looking into the failures. Sorry if I broke the build. I ran TestMRJobs locally - it failed, but shows PrivelegedExceptions (attaching the test outptut). Let me run sample jobs on a pseudo-dist setup and see what the deal is. Before submitting the patch, I did run jobs, but only on large values of io.sort.mb. Will report back soon.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1377 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1377/)
          MAPREDUCE-5028. Maps fail when io.sort.mb is set to high value. (kkambatl via tucu) (Revision 1457918)

          Result = FAILURE
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1457918
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1377 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1377/ ) MAPREDUCE-5028 . Maps fail when io.sort.mb is set to high value. (kkambatl via tucu) (Revision 1457918) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1457918 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1349 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1349/)
          MAPREDUCE-5028. Maps fail when io.sort.mb is set to high value. (kkambatl via tucu) (Revision 1457918)

          Result = FAILURE
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1457918
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1349 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1349/ ) MAPREDUCE-5028 . Maps fail when io.sort.mb is set to high value. (kkambatl via tucu) (Revision 1457918) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1457918 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #160 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/160/)
          MAPREDUCE-5028. Maps fail when io.sort.mb is set to high value. (kkambatl via tucu) (Revision 1457918)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1457918
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #160 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/160/ ) MAPREDUCE-5028 . Maps fail when io.sort.mb is set to high value. (kkambatl via tucu) (Revision 1457918) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1457918 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Hide
          Chris Nauroth added a comment -

          I'm seeing some new test failures since this change went to trunk. One example is TestMRJobs#testDistributedCache. The MR job fails with this in the syslog:

          2013-03-18 23:39:25,065 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Negative value-length not allowed: -6 for org.apache.hadoop.mapred.MapTask$MapOutputBuffer$InMemValBytes@1b1f1f12
                  at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:243)
                  at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1597)
                  at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
                  at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:694)
                  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:762)
                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
                  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
                  at java.security.AccessController.doPrivileged(Native Method)
                  at javax.security.auth.Subject.doAs(Subject.java:396)
                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
                  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
          
          Show
          Chris Nauroth added a comment - I'm seeing some new test failures since this change went to trunk. One example is TestMRJobs#testDistributedCache . The MR job fails with this in the syslog: 2013-03-18 23:39:25,065 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Negative value-length not allowed: -6 for org.apache.hadoop.mapred.MapTask$MapOutputBuffer$InMemValBytes@1b1f1f12 at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:243) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1597) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:694) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:762) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3487 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3487/)
          MAPREDUCE-5028. Maps fail when io.sort.mb is set to high value. (kkambatl via tucu) (Revision 1457918)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1457918
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3487 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3487/ ) MAPREDUCE-5028 . Maps fail when io.sort.mb is set to high value. (kkambatl via tucu) (Revision 1457918) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1457918 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
          Hide
          Alejandro Abdelnur added a comment -

          Thanks Karthik. Committed to trunk and branch-2.

          Show
          Alejandro Abdelnur added a comment - Thanks Karthik. Committed to trunk and branch-2.
          Hide
          Karthik Kambatla added a comment -

          Chris Douglas, will you be able to take a look at the latest patch? Your valuable insights from MAPREDUCE-64 experience will surely help.

          Show
          Karthik Kambatla added a comment - Chris Douglas , will you be able to take a look at the latest patch? Your valuable insights from MAPREDUCE-64 experience will surely help.
          Hide
          Karthik Kambatla added a comment -

          Thanks Alejandro. Given Alejandro has committed this to branch-1, adding 1.2 to the fix version.

          Show
          Karthik Kambatla added a comment - Thanks Alejandro. Given Alejandro has committed this to branch-1, adding 1.2 to the fix version.
          Hide
          Alejandro Abdelnur added a comment -

          Thanks Karthik. I've committed the patch for branch-1. Thanks Chris for reviewing it.

          +1 for the trunk patch. I'll wait for a bit to see if there are comments for others before committing it.

          Show
          Alejandro Abdelnur added a comment - Thanks Karthik. I've committed the patch for branch-1. Thanks Chris for reviewing it. +1 for the trunk patch. I'll wait for a bit to see if there are comments for others before committing it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12571997/mr-5028-trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3383//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3383//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571997/mr-5028-trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3383//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3383//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Uploading a patch for trunk that fixes the issue of tasks failing when mapreduce.task.io.sort.mb is set to a high value ~ 1280 MB.

          • In addition to inappropriate DataInputBuffer#reset() calls, MapOutputBuffer#setEquator() and MapOutputBuffer#resetSpills() had to be changed to accommodate integer overflows in the middle of calculations.
          • I went through rest of the cases where reset() is called, and AFAICT fixed those that need to be.
          • Ran a sample wordcount on trunk with dfs.block.size set to 2GB, io.sort.mb = 1280 and container-size=2.5GB
          • The patch doesn't add any new tests; I couldn't figure out a way to test integer overflows.
          Show
          Karthik Kambatla added a comment - Uploading a patch for trunk that fixes the issue of tasks failing when mapreduce.task.io.sort.mb is set to a high value ~ 1280 MB. In addition to inappropriate DataInputBuffer#reset() calls, MapOutputBuffer#setEquator() and MapOutputBuffer#resetSpills() had to be changed to accommodate integer overflows in the middle of calculations. I went through rest of the cases where reset() is called, and AFAICT fixed those that need to be. Ran a sample wordcount on trunk with dfs.block.size set to 2GB, io.sort.mb = 1280 and container-size=2.5GB The patch doesn't add any new tests; I couldn't figure out a way to test integer overflows.
          Hide
          Karthik Kambatla added a comment -

          Uploading a patch on branch-1 removing the comment changes in DataInputBuffer. The current patch fixes the reported bug.

          Show
          Karthik Kambatla added a comment - Uploading a patch on branch-1 removing the comment changes in DataInputBuffer. The current patch fixes the reported bug.
          Hide
          Chris Douglas added a comment -

          I'd rather leave DataInputBuffer alone. But if the current doc really bothers you, please use formatting consistent with the rest of the project.

          +1 on the changes to 1.x, but changing the scope of the issue to fix all misuses of DataInputBuffer::reset in 2.x and trunk would be preferred. This issue doesn't cover all errors for large sort buffers, and separate issues for each branch are avoidable. But I won't block a bug fix.

          Show
          Chris Douglas added a comment - I'd rather leave DataInputBuffer alone. But if the current doc really bothers you, please use formatting consistent with the rest of the project. +1 on the changes to 1.x, but changing the scope of the issue to fix all misuses of DataInputBuffer::reset in 2.x and trunk would be preferred. This issue doesn't cover all errors for large sort buffers, and separate issues for each branch are avoidable. But I won't block a bug fix.
          Hide
          Karthik Kambatla added a comment -

          Chris Douglas, do you think the newly added comments are fine and the patch good to go?

          Show
          Karthik Kambatla added a comment - Chris Douglas , do you think the newly added comments are fine and the patch good to go?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12571090/mr-5028-branch1.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3365//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571090/mr-5028-branch1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3365//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          The -1 from Hadoop QA was because the uploaded branch is for patch 1. I have verified ant test-core passes.

          Show
          Karthik Kambatla added a comment - The -1 from Hadoop QA was because the uploaded branch is for patch 1. I have verified ant test-core passes.
          Hide
          Karthik Kambatla added a comment -

          Thanks for your comments, Chris.

          Just to be clear, it's not the size of the backing array, but "the index one greater than the last valid character in the input stream buffer." (ByteArrayInputStream) The change to DataInputBuffer implies the former, which is inaccurate.

          Removed that comment, and added comments to DataInputBuffer#reset() to reflect ByteArrayInputStream.

          There's another misuse at ReduceContextImpl.ValueIterator::next that could be included with these changes.

          If I am not mistaken, ReduceContextImpl is only in trunk versions. This patch is only for branch-1. Filed MAPREDUCE-5031 earlier to address the slightly different issues on trunk, can mark it as a duplicate to the one that you created. Have a half-baked patch, I can submit that there.

          Show
          Karthik Kambatla added a comment - Thanks for your comments, Chris. Just to be clear, it's not the size of the backing array, but "the index one greater than the last valid character in the input stream buffer." (ByteArrayInputStream) The change to DataInputBuffer implies the former, which is inaccurate. Removed that comment, and added comments to DataInputBuffer#reset() to reflect ByteArrayInputStream . There's another misuse at ReduceContextImpl.ValueIterator::next that could be included with these changes. If I am not mistaken, ReduceContextImpl is only in trunk versions. This patch is only for branch-1. Filed MAPREDUCE-5031 earlier to address the slightly different issues on trunk, can mark it as a duplicate to the one that you created. Have a half-baked patch, I can submit that there.
          Hide
          Chris Douglas added a comment -

          getLength() returns the size of the entire buffer, and not just the remaining part of the buffer

          Just to be clear, it's not the size of the backing array, but "the index one greater than the last valid character in the input stream buffer." (ByteArrayInputStream) The change to DataInputBuffer implies the former, which is inaccurate.

          The corrects several misuses of DataInputBuffer, which is great. There's another misuse at ReduceContextImpl.ValueIterator::next that could be included with these changes.

          Most of this code doesn't check for overflow; it wasn't written for extremely large buffers. Just glancing at related code, MapTask.InMemValBytes contains code that could overflow and I'm sure there are others. Filed MAPREDUCE-5032.

          Show
          Chris Douglas added a comment - getLength() returns the size of the entire buffer, and not just the remaining part of the buffer Just to be clear, it's not the size of the backing array, but "the index one greater than the last valid character in the input stream buffer." (ByteArrayInputStream) The change to DataInputBuffer implies the former, which is inaccurate. The corrects several misuses of DataInputBuffer, which is great. There's another misuse at ReduceContextImpl.ValueIterator::next that could be included with these changes. Most of this code doesn't check for overflow; it wasn't written for extremely large buffers. Just glancing at related code, MapTask.InMemValBytes contains code that could overflow and I'm sure there are others. Filed MAPREDUCE-5032 .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12570924/mr-5028-branch1.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3360//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570924/mr-5028-branch1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3360//console This message is automatically generated.
          Hide
          Karthik Kambatla added a comment -

          Ran into a different bug while trying to reproduce this on trunk. Filed MAPREDUCE-5031 for the same. Will port the fix from here if need be.

          Show
          Karthik Kambatla added a comment - Ran into a different bug while trying to reproduce this on trunk. Filed MAPREDUCE-5031 for the same. Will port the fix from here if need be.
          Hide
          Karthik Kambatla added a comment -

          Uploading a patch that fixes the issue.

          Turns out buggy use of DataInputBuffer.reset() and DataInputBuffer.getLength() was leading to integer overflow at these considerable large values of io.sort.mb.

          getLength() returns the size of the entire buffer, and not just the remaining part of the buffer. The offending code assumes otherwise leading to this issue.

          For instance, if a key is at position 1,224,906,830 extending to 1,224,906,868. The length of this key should be set to 38 - the code instead sets it to 1,224,906,868. The data buffer interprets this key to end at 2,449,813,698 which is bigger than 2^31-1 leading to a negative value!!

          For something like this to happen, the starting position of the key should be large ~ > (2^31 - 1 - key_size)/2 = 1073741823.5 - key_size/2.

          The reported 1280 MB is larger than this, and hence we see the issue.

          The patch fixes the use of reset() and getLength() as required, and also updates the javadoc of getLength().

          Verified that the patch fixes the problem for the same cases as in the description. Not sure how to write a test for this.

          Show
          Karthik Kambatla added a comment - Uploading a patch that fixes the issue. Turns out buggy use of DataInputBuffer.reset() and DataInputBuffer.getLength() was leading to integer overflow at these considerable large values of io.sort.mb. getLength() returns the size of the entire buffer, and not just the remaining part of the buffer. The offending code assumes otherwise leading to this issue. For instance, if a key is at position 1,224,906,830 extending to 1,224,906,868. The length of this key should be set to 38 - the code instead sets it to 1,224,906,868. The data buffer interprets this key to end at 2,449,813,698 which is bigger than 2^31-1 leading to a negative value!! For something like this to happen, the starting position of the key should be large ~ > (2^31 - 1 - key_size)/2 = 1073741823.5 - key_size/2. The reported 1280 MB is larger than this, and hence we see the issue. The patch fixes the use of reset() and getLength() as required, and also updates the javadoc of getLength(). Verified that the patch fixes the problem for the same cases as in the description. Not sure how to write a test for this.
          Show
          Karthik Kambatla added a comment - http://comments.gmane.org/gmane.comp.java.hadoop.mapreduce.user/2485 looks like the same issue.

            People

            • Assignee:
              Karthik Kambatla
              Reporter:
              Karthik Kambatla
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development