Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1777

In streaming, jobs that used to work, crash in the map phase -- even if the mapper is /bin/cat

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: contrib/streaming
    • Labels:
      None

      Description

      The exception is either "out of memory" of or "broken pipe" – see both stack dumps bellow.

      st Hadoop input: |null|
      last tool output: |[B@20fa83|
      Date: Sat Dec 15 21:02:18 UTC 2007
      java.io.IOException: Broken pipe
      at java.io.FileOutputStream.writeBytes(Native Method)
      at java.io.FileOutputStream.write(FileOutputStream.java:260)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
      at java.io.DataOutputStream.flush(DataOutputStream.java:106)
      at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:96)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

      at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:107)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

      -------------------------------------------------
      java.io.IOException: MROutput/MRErrThread
      failed:java.lang.OutOfMemoryError: Java heap space
      at java.util.Arrays.copyOf(Arrays.java:2786)
      at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
      at java.io.DataOutputStream.write(DataOutputStream.java:90)
      at org.apache.hadoop.io.Text.write(Text.java:243)
      at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect (MapTask.java:347)
      at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run (PipeMapRed.java:344)

      at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:76)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
      1760)

        Activity

        Hide
        Arun C Murthy added a comment -

        Arkady, what is the max java heapsize for the child when you notice this OOM?

        Show
        Arun C Murthy added a comment - Arkady, what is the max java heapsize for the child when you notice this OOM?
        Hide
        Runping Qi added a comment -

        If the stream mapper stalled for some reason and cannot consume the std input, while the
        Java MapRed wrapper continues to pipe to the mapper, then maybe too much data will be
        accumulated in the std input pipe.
        That may cause broken pipe or oom exception. What did the mapper do?

        Show
        Runping Qi added a comment - If the stream mapper stalled for some reason and cannot consume the std input, while the Java MapRed wrapper continues to pipe to the mapper, then maybe too much data will be accumulated in the std input pipe. That may cause broken pipe or oom exception. What did the mapper do?
        Hide
        Krishna Ramachandran added a comment -

        ran a simple streaming job (/bin/cat) using trunk and did not see any problem

        hadoop jar ./hadoop-0.22.0-alpha-1-streaming.jar -input /tmp/ramach -output /tmp/out1 -mapper /bin/cat -reducer NONE

        it ran fine

        10/08/19 02:36:51 INFO streaming.StreamJob: Tracking URL: http://ucdev11.inktomisearch.com:50030/jobdetails.jsp?jobid=job_201008060139_0059
        10/08/19 02:36:51 INFO mapreduce.Job: Running job: job_201008060139_0059
        10/08/19 02:36:52 INFO mapreduce.Job: map 0% reduce 0%
        10/08/19 02:37:08 INFO mapreduce.Job: map 100% reduce 0%
        10/08/19 02:37:13 INFO mapreduce.Job: Job complete: job_201008060139_0059
        10/08/19 02:37:13 INFO streaming.StreamJob: Output directory: /tmp/out1

        Show
        Krishna Ramachandran added a comment - ran a simple streaming job (/bin/cat) using trunk and did not see any problem hadoop jar ./hadoop-0.22.0-alpha-1-streaming.jar -input /tmp/ramach -output /tmp/out1 -mapper /bin/cat -reducer NONE it ran fine 10/08/19 02:36:51 INFO streaming.StreamJob: Tracking URL: http://ucdev11.inktomisearch.com:50030/jobdetails.jsp?jobid=job_201008060139_0059 10/08/19 02:36:51 INFO mapreduce.Job: Running job: job_201008060139_0059 10/08/19 02:36:52 INFO mapreduce.Job: map 0% reduce 0% 10/08/19 02:37:08 INFO mapreduce.Job: map 100% reduce 0% 10/08/19 02:37:13 INFO mapreduce.Job: Job complete: job_201008060139_0059 10/08/19 02:37:13 INFO streaming.StreamJob: Output directory: /tmp/out1
        Hide
        Krishna Ramachandran added a comment -

        It has been sitting for a while looks like is no longer a problem. I could not reproduce
        Closing for now

        Show
        Krishna Ramachandran added a comment - It has been sitting for a while looks like is no longer a problem. I could not reproduce Closing for now

          People

          • Assignee:
            Unassigned
            Reporter:
            arkady borkovsky
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development