Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5028

Maps fail when io.sort.mb is set to high value

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.1.1, 2.0.3-alpha, 0.23.5
    • 1.2.0, 2.4.0
    • None
    • None

    Description

      Verified the problem exists on branch-1 with the following configuration:

      Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, io.sort.mb=1280, dfs.block.size=2147483648

      Run teragen to generate 4 GB data
      Maps fail when you run wordcount on this configuration with the following error:

      java.io.IOException: Spill failed
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
      	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
      	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      	at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
      	at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
      	at org.apache.hadoop.mapred.Child.main(Child.java:249)
      Caused by: java.io.EOFException
      	at java.io.DataInputStream.readInt(DataInputStream.java:375)
      	at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
      	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
      	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
      	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
      	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
      	at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
      

      Attachments

        1. mr-5028-3.patch
          20 kB
          Karthik Kambatla
        2. mr-5028-2.patch
          16 kB
          Karthik Kambatla
        3. mr-5028-1.patch
          16 kB
          Karthik Kambatla
        4. MR-5028_testapp.patch
          11 kB
          Arun Murthy
        5. repro-mr-5028.patch
          9 kB
          Karthik Kambatla
        6. mr-5028-trunk.patch
          5 kB
          Karthik Kambatla
        7. mr-5028-trunk.patch
          4 kB
          Karthik Kambatla
        8. mr-5028-trunk.patch
          4 kB
          Karthik Kambatla
        9. mr-5028-branch1.patch
          2 kB
          Karthik Kambatla
        10. mr-5028-branch1.patch
          3 kB
          Karthik Kambatla
        11. mr-5028-branch1.patch
          3 kB
          Karthik Kambatla

        Issue Links

          Activity

            People

              kasha Karthik Kambatla
              kasha Karthik Kambatla
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: