Hadoop Common
  1. Hadoop Common
  2. HADOOP-4620

Streaming mapper never completes if the mapper does not write to stdout

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.2
    • Fix Version/s: 0.18.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This patch HADOOP-4620.patch
      (1) solves the hanging problem on map side with empty input and nonempty output — this map task generates output properly to intermediate files similar to other map tasks.
      (2) solves the problem of hanging reducer with empty input to reduce task and nonempty output — this reduce task doesn't generate output if input to reduce task is empty.
      Show
      This patch HADOOP-4620 .patch (1) solves the hanging problem on map side with empty input and nonempty output — this map task generates output properly to intermediate files similar to other map tasks. (2) solves the problem of hanging reducer with empty input to reduce task and nonempty output — this reduce task doesn't generate output if input to reduce task is empty.

      Description

      A mapper of a streaming job has empty input data and thus it produces no output.
      The task never completes.

      The following are the last two lines from the task log:
      2008-11-07 21:59:48,254 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/usr/bin/perl, xxx]
      2008-11-07 21:59:48,330 INFO org.apache.hadoop.streaming.PipeMapRed: mapRedFinished

      1. solves_mapper_4620.patch
        5 kB
        Ravi Gummadi
      2. HADOOP-4620.patch
        10 kB
        Ravi Gummadi
      3. HADOOP17-4620.patch
        10 kB
        Ravi Gummadi

        Issue Links

          Activity

          Hide
          Devaraj Das added a comment -

          I just committed this to 0.18, 0.19 branches and trunk. Thanks, Ravi!

          Show
          Devaraj Das added a comment - I just committed this to 0.18, 0.19 branches and trunk. Thanks, Ravi!
          Hide
          Owen O'Malley added a comment -

          +1 for putting this into 0.18.3

          Show
          Owen O'Malley added a comment - +1 for putting this into 0.18.3
          Hide
          Sharad Agarwal added a comment -

          +1

          Show
          Sharad Agarwal added a comment - +1
          Hide
          Ravi Gummadi added a comment -

          HADOOP17-4620.patch is for branch 17. Patch HADOOP-4620.patch applies fine to trunk and version 0.18.2
          Here are the results of testing HADOOP-4620.patch :

          +1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          Unit tests also passed on my machine.

          Show
          Ravi Gummadi added a comment - HADOOP17-4620.patch is for branch 17. Patch HADOOP-4620 .patch applies fine to trunk and version 0.18.2 Here are the results of testing HADOOP-4620 .patch : +1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. Unit tests also passed on my machine.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12395526/HADOOP17-4620.patch
          against trunk revision 724578.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3695/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12395526/HADOOP17-4620.patch against trunk revision 724578. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3695/console This message is automatically generated.
          Hide
          Ravi Gummadi added a comment -

          The patch for trunk(HADOOP-4620.patch) applies to version 18 also without error.
          Attaching now the patch for version-0.17 HADOOP17-4620.patch.

          Show
          Ravi Gummadi added a comment - The patch for trunk( HADOOP-4620 .patch) applies to version 18 also without error. Attaching now the patch for version-0.17 HADOOP17-4620.patch.
          Hide
          Ravi Gummadi added a comment -

          Submitting the patch....

          Show
          Ravi Gummadi added a comment - Submitting the patch....
          Hide
          Ravi Gummadi added a comment -

          Attached the patch HADOOP-4620.patch that
          (1) solves the hanging problem on map side with empty input and nonempty output — generates output properly to intermediate files similar to other map tasks.
          (2) solves the problem of hanging reducer with empty input to reduce task and nonempty output — doesn't generate output if input to reduce task is empty.

          Please review the patch and provide your comments. Thanks.

          Show
          Ravi Gummadi added a comment - Attached the patch HADOOP-4620 .patch that (1) solves the hanging problem on map side with empty input and nonempty output — generates output properly to intermediate files similar to other map tasks. (2) solves the problem of hanging reducer with empty input to reduce task and nonempty output — doesn't generate output if input to reduce task is empty. Please review the patch and provide your comments. Thanks.
          Hide
          Runping Qi added a comment -

          +1

          Show
          Runping Qi added a comment - +1
          Hide
          Devaraj Das added a comment -

          I'd propose that for Reducers, we solve only the "hang" problem for the empty input case. I don't think it is required to support the collector for reduces with empty inputs. A reducer is supposed to aggregate/reduce data that the maps generate for it and if the maps didn't generate anything for a given reducer it seems okay that it should not generate any output.
          For the maps, the picture is a bit different with empty inputs and we should support it. The use case here is Hadoop enables parallelizing the native application (where the application could be reading its input off some source which Hadoop is not aware of), or, it just enables running multiple instances of the native application on a cluster. This use case might set the number of reduces to 0.
          What do others think?

          Show
          Devaraj Das added a comment - I'd propose that for Reducers, we solve only the "hang" problem for the empty input case. I don't think it is required to support the collector for reduces with empty inputs. A reducer is supposed to aggregate/reduce data that the maps generate for it and if the maps didn't generate anything for a given reducer it seems okay that it should not generate any output. For the maps, the picture is a bit different with empty inputs and we should support it. The use case here is Hadoop enables parallelizing the native application (where the application could be reading its input off some source which Hadoop is not aware of), or, it just enables running multiple instances of the native application on a cluster. This use case might set the number of reduces to 0. What do others think?
          Hide
          Sharad Agarwal added a comment -

          Had a discussion on this Devaraj. It seems there aren't reasonable usecase which may require outputting from reducer with no input. It should be ok to not output anything from reducer with no input. However hang problem should be fixed for it as well.

          Show
          Sharad Agarwal added a comment - Had a discussion on this Devaraj. It seems there aren't reasonable usecase which may require outputting from reducer with no input. It should be ok to not output anything from reducer with no input. However hang problem should be fixed for it as well.
          Hide
          Ravi Gummadi added a comment -

          The attached patch solves the hanging issue on map side only . It doesn't solve the problem of reducer hanging when 0 sized input is given and reducer generates nonzero sized output.

          Show
          Ravi Gummadi added a comment - The attached patch solves the hanging issue on map side only . It doesn't solve the problem of reducer hanging when 0 sized input is given and reducer generates nonzero sized output.
          Hide
          Devaraj Das added a comment -

          if mapper has no data to handle. We could not start native process. This will improve performance and avoid this problem.

          -1. I think we should start the native process anyways.

          Show
          Devaraj Das added a comment - if mapper has no data to handle. We could not start native process. This will improve performance and avoid this problem. -1. I think we should start the native process anyways.
          Hide
          Ruyue Ma added a comment - - edited

          if mapper has no data to handle. We could not start native process. This will improve performance and avoid this problem.

          so i suggest that it should start the native process in PiperMap->map().

          we can move PipeMapRed part code to map function

          // Start the process
          ProcessBuilder builder = new ProcessBuilder(argvSplit);
          builder.environment().putAll(childEnv.toMap());
          sim = builder.start();

          clientOut_ = new DataOutputStream(new BufferedOutputStream(
          sim.getOutputStream(),
          BUFFER_SIZE));
          clientIn_ = new DataInputStream(new BufferedInputStream(
          sim.getInputStream(),
          BUFFER_SIZE));
          clientErr_ = new DataInputStream(new BufferedInputStream(sim.getErrorStream()));
          startTime_ = System.currentTimeMillis();

          errThread_ = new MRErrorThread();
          errThread_.start();

          Show
          Ruyue Ma added a comment - - edited if mapper has no data to handle. We could not start native process. This will improve performance and avoid this problem. so i suggest that it should start the native process in PiperMap->map(). we can move PipeMapRed part code to map function // Start the process ProcessBuilder builder = new ProcessBuilder(argvSplit); builder.environment().putAll(childEnv.toMap()); sim = builder.start(); clientOut_ = new DataOutputStream(new BufferedOutputStream( sim.getOutputStream(), BUFFER_SIZE)); clientIn_ = new DataInputStream(new BufferedInputStream( sim.getInputStream(), BUFFER_SIZE)); clientErr_ = new DataInputStream(new BufferedInputStream(sim.getErrorStream())); startTime_ = System.currentTimeMillis(); errThread_ = new MRErrorThread(); errThread_.start();
          Hide
          Ravi Gummadi added a comment -

          Yes Runping. I have a patch ready with those changes(including setting map runner in StreamJob). Its working fine for mappers — not hanging and giving proper output. Investigating the similar issue on the reducer side.

          Show
          Ravi Gummadi added a comment - Yes Runping. I have a patch ready with those changes(including setting map runner in StreamJob). Its working fine for mappers — not hanging and giving proper output. Investigating the similar issue on the reducer side.
          Hide
          Runping Qi added a comment -

          I see.

          If we take this approach, the StreamJob needs to set the map runner class too when it sets the mapper class.

          Show
          Runping Qi added a comment - I see. If we take this approach, the StreamJob needs to set the map runner class too when it sets the mapper class.
          Hide
          Ravi Gummadi added a comment -

          Hi Runping, This is because OutputCollector and Reporter are parameters to MROutputThread(), but they are not available in configure().

          Show
          Ravi Gummadi added a comment - Hi Runping, This is because OutputCollector and Reporter are parameters to MROutputThread(), but they are not available in configure().
          Hide
          Runping Qi added a comment -

          Why not simply start the MROutThread in the config function in the same way as for MRErrorThread?

          Show
          Runping Qi added a comment - Why not simply start the MROutThread in the config function in the same way as for MRErrorThread?
          Hide
          Ravi Gummadi added a comment -

          Adding the following new class in streaming and starting the output thread in its run() method solves the problem of mapper hanging(when 0 byte input & nonzero output).

          +public class PipeMapRunner<K1, V1, K2, V2> extends MapRunner<K1, V1, K2, V2> {
          + public void run(RecordReader<K1, V1> input, OutputCollector<K2, V2> output,
          + Reporter reporter)
          + throws IOException

          { + PipeMapper pipeMapper = (PipeMapper)getMapper(); + pipeMapper.startOutputThreads(output, reporter); + super.run(input, output, reporter); + }

          +}

          Investigating if similar thing can be done in Reduce phase also(Reducer also hangs if input is of size 0 bytes and if it produces output).
          Thoughts ?

          Show
          Ravi Gummadi added a comment - Adding the following new class in streaming and starting the output thread in its run() method solves the problem of mapper hanging(when 0 byte input & nonzero output). +public class PipeMapRunner<K1, V1, K2, V2> extends MapRunner<K1, V1, K2, V2> { + public void run(RecordReader<K1, V1> input, OutputCollector<K2, V2> output, + Reporter reporter) + throws IOException { + PipeMapper pipeMapper = (PipeMapper)getMapper(); + pipeMapper.startOutputThreads(output, reporter); + super.run(input, output, reporter); + } +} Investigating if similar thing can be done in Reduce phase also(Reducer also hangs if input is of size 0 bytes and if it produces output). Thoughts ?
          Hide
          Ruyue Ma added a comment -

          I agree with Ravi.

          if mapper will generate data, we will allow it, or report error. Hang is not accepted!

          Thanks

          Show
          Ruyue Ma added a comment - I agree with Ravi. if mapper will generate data, we will allow it, or report error. Hang is not accepted! Thanks
          Hide
          Ravi Gummadi added a comment -

          The ideal behaviour could be to have the correct output in this case also(input is of 0 size for mapper/reducer and the task generates output).

          Is it good enough that the mapper/reducer doesn't hang and finish execution in the case of 0 sized input(even though the output of the mapper/reducer is not available in intermediateFile/HDFS at the end of the task) ?

          Show
          Ravi Gummadi added a comment - The ideal behaviour could be to have the correct output in this case also(input is of 0 size for mapper/reducer and the task generates output). Is it good enough that the mapper/reducer doesn't hang and finish execution in the case of 0 sized input(even though the output of the mapper/reducer is not available in intermediateFile/HDFS at the end of the task) ?
          Hide
          Ravi Gummadi added a comment -

          Thanks. I could reproduce the problem now.

          Show
          Ravi Gummadi added a comment - Thanks. I could reproduce the problem now.
          Hide
          Ruyue Ma added a comment -

          Hi, Ravi

          You can re-produce the problem.

          copy a zero-length file to hdfs. Then run streaming job for this file. You know mapper will handle zero-length data, if the mapper still produce out, if the out is larger than 4KB, the mapper task will hang.

          The reason is: if mapper input data is zero, streaming would not start map output thread, but open the mapper's stdout pipe, the pipe's size defaultly is 4KB. If mapper doesn't generate data large than 4KB, mapper will end normally. But if larger than 4KB, it will hang.

          If you guys think it is bug, I can submit a patch to correct it.

          As far as I know, the latest revision resolve the problem for stderr output thread, but not for stdout output thread.

          Thanks

          Show
          Ruyue Ma added a comment - Hi, Ravi You can re-produce the problem. copy a zero-length file to hdfs. Then run streaming job for this file. You know mapper will handle zero-length data, if the mapper still produce out, if the out is larger than 4KB, the mapper task will hang. The reason is: if mapper input data is zero, streaming would not start map output thread, but open the mapper's stdout pipe, the pipe's size defaultly is 4KB. If mapper doesn't generate data large than 4KB, mapper will end normally. But if larger than 4KB, it will hang. If you guys think it is bug, I can submit a patch to correct it. As far as I know, the latest revision resolve the problem for stderr output thread, but not for stdout output thread. Thanks
          Hide
          Koji Noguchi added a comment -

          I've seen streaming hang when

          1. Empty input
          2. Streaming process still tries to output.

          This is because MROutputThread is only created when first record is passed to the map().
          No input, thus no map(), thus no MROutputThread, and streaming-process stdout write hang forever.
          (in 0.17, we still had the timeout of 0(infinity))

          Show
          Koji Noguchi added a comment - I've seen streaming hang when Empty input Streaming process still tries to output. This is because MROutputThread is only created when first record is passed to the map(). No input, thus no map(), thus no MROutputThread, and streaming-process stdout write hang forever. (in 0.17, we still had the timeout of 0(infinity))
          Hide
          Ravi Gummadi added a comment -

          So there is no issue of tasks hanging forever(even if the size of the map output is 0 bytes). I agree that the message "mapRedFinished" is displayed before output thread and error thread have completed their work. Log messages will be changed to be displayed appopriately.

          Otherwise, please provide a testcase that reproduces the problem of Map Task hanging. I am not able to reproduce the problem of hanging map task.

          Show
          Ravi Gummadi added a comment - So there is no issue of tasks hanging forever(even if the size of the map output is 0 bytes). I agree that the message "mapRedFinished" is displayed before output thread and error thread have completed their work. Log messages will be changed to be displayed appopriately. Otherwise, please provide a testcase that reproduces the problem of Map Task hanging. I am not able to reproduce the problem of hanging map task.
          Hide
          Runping Qi added a comment -

          I double checked the jobs behaving like that.
          The mapping child process (the perl script) did not exit (thus the task never completed because the user process never completed).
          I think the messages in the log are misleading.
          Maybe we just need to fix the logging part.

          Show
          Runping Qi added a comment - I double checked the jobs behaving like that. The mapping child process (the perl script) did not exit (thus the task never completed because the user process never completed). I think the messages in the log are misleading. Maybe we just need to fix the logging part.

            People

            • Assignee:
              Ravi Gummadi
              Reporter:
              Runping Qi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development