Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3790

Broken pipe on streaming job can lead to truncated output for a successful job

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.1, 2.0.0-alpha
    • Fix Version/s: 0.23.2
    • Component/s: contrib/streaming, mrv2
    • Labels:
      None

      Description

      If a streaming job doesn't consume all of its input then the job can be marked successful even though the job's output is truncated.

      Here's a simple setup that can exhibit the problem. Note that the job output will most likely be truncated compared to the same job run with a zero-length input file.

      $ hdfs dfs -cat in
      foo
      $ yarn jar ./share/hadoop/tools/lib/hadoop-streaming-0.24.0-SNAPSHOT.jar -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -mapper /bin/env -reducer NONE -input in -output out
      

      Examining the map task log shows this:

      Excerpt from map task stdout log
      2012-02-02 11:27:25,054 WARN [main] org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Broken pipe
      2012-02-02 11:27:25,054 INFO [main] org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
      2012-02-02 11:27:25,056 WARN [Thread-12] org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Bad file descriptor
      2012-02-02 11:27:25,124 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1328203555769_0001_m_000000_0 is done. And is in the process of commiting
      2012-02-02 11:27:25,127 WARN [Thread-11] org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: DFSOutputStream is closed
      2012-02-02 11:27:25,199 INFO [main] org.apache.hadoop.mapred.Task: Task attempt_1328203555769_0001_m_000000_0 is allowed to commit now
      2012-02-02 11:27:25,225 INFO [main] org.apache.hadoop.mapred.FileOutputCommitter: Saved output of task 'attempt_1328203555769_0001_m_000000_0' to hdfs://localhost:9000/user/somebody/out/_temporary/1
      2012-02-02 11:27:27,834 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1328203555769_0001_m_000000_0' done.
      

      In PipeMapRed.mapRedFinished() we can see it will eat IOExceptions and return without waiting for the output threads or throwing a runtime exception to fail the job. Net result is that the DFS streams could be shutdown too early if the output threads are still busy and we could lose job output.

      Fixing this brings up the bigger question of what should happen when a streaming job doesn't consume all of its input. Should we have grabbed all of the output from the job and still marked it successful or should we have failed the job? If the former then we need to fix some other places in the code as well, since feeding a much larger input file (e.g.: 600K) to the same sample streaming job results in the job failing with the exception below. It wouldn't be consistent to fail the job that doesn't consume a lot of input but pass the job that leaves just a few leftovers.

      2012-02-02 10:29:37,220 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1270)) - Running job: job_1328200108174_0001
      2012-02-02 10:29:44,354 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1291)) - Job job_1328200108174_0001 running in uber mode : false
      2012-02-02 10:29:44,355 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1298)) -  map 0% reduce 0%
      2012-02-02 10:29:46,394 INFO  mapreduce.Job (Job.java:printTaskEvents(1386)) - Task Id : attempt_1328200108174_0001_m_000000_0, Status : FAILED
      Error: java.io.IOException: Broken pipe
      	at java.io.FileOutputStream.writeBytes(Native Method)
      	at java.io.FileOutputStream.write(FileOutputStream.java:282)
      	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
      	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
      	at java.io.DataOutputStream.write(DataOutputStream.java:90)
      	at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
      	at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
      	at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
      	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
      	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
      	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:329)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
      

      Assuming the job returns a successful exit code, I think we should allow the job to complete successfully even though it doesn't consume all of its inputs. Part of the reasoning is that there's already this comment in PipeMapper.java that implies we desire that behavior:

      PipeMapper.java
              // terminate with success:
              // swallow input records although the stream processor failed/closed
      

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1005 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1005/)
        MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294750)
        MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294743)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294750
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294743
        Files :

        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java
        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java
        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1005 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1005/ ) MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294750) MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294743) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294750 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294743 Files : /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-0.23-Build #211 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/211/)
        svn merge -c 1294750 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294751)
        svn merge -c 1294743 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294747)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294751
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294747
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java
        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java
        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #211 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/211/ ) svn merge -c 1294750 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294751) svn merge -c 1294743 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294747) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294751 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294747 Files : /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #183 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/183/)
        svn merge -c 1294750 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294751)
        svn merge -c 1294743 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294747)

        Result = FAILURE
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294751
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294747
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java
        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java
        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #183 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/183/ ) svn merge -c 1294750 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294751) svn merge -c 1294743 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294747) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294751 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294747 Files : /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #970 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/970/)
        MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294750)
        MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294743)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294750
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294743
        Files :

        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java
        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java
        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #970 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/970/ ) MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294750) MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294743) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294750 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294743 Files : /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-0.23-Commit #609 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/609/)
        svn merge -c 1294750 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294751)

        Result = ABORTED
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294751
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #609 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/609/ ) svn merge -c 1294750 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294751) Result = ABORTED bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294751 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #1804 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1804/)
        MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294750)
        MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294743)

        Result = ABORTED
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294750
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294743
        Files :

        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java
        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java
        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1804 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1804/ ) MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294750) MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294743) Result = ABORTED bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294750 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294743 Files : /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-0.23-Commit #608 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/608/)
        svn merge -c 1294743 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294747)

        Result = ABORTED
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294747
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java
        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java
        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #608 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/608/ ) svn merge -c 1294743 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294747) Result = ABORTED bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294747 Files : /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Commit #595 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/595/)
        svn merge -c 1294750 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294751)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294751
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #595 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/595/ ) svn merge -c 1294750 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294751) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294751 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-0.23-Commit #607 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/607/)
        svn merge -c 1294750 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294751)
        svn merge -c 1294743 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294747)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294751
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294747
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java
        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java
        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #607 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/607/ ) svn merge -c 1294750 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294751) svn merge -c 1294743 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294747) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294751 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294747 Files : /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #1794 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1794/)
        MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294750)
        MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294743)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294750
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294743
        Files :

        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java
        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java
        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1794 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1794/ ) MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294750) MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294743) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294750 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294743 Files : /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #1868 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1868/)
        MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294750)
        MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294743)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294750
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt

        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294743
        Files :

        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java
        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java
        • /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1868 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1868/ ) MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294750) MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294743) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294750 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294743 Files : /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java /hadoop/common/trunk/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Commit #594 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/594/)
        svn merge -c 1294743 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294747)

        Result = SUCCESS
        bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294747
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java
        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java
        • /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #594 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/594/ ) svn merge -c 1294743 trunk to branch-0.23 FIXES MAPREDUCE-3790 Broken pipe on streaming job can lead to truncated output for a successful job (Jason Lowe via bobby) (Revision 1294747) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294747 Files : /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/OutputOnlyApp.java /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/TestUnconsumedInput.java
        Hide
        Robert Joseph Evans added a comment -

        Thanks Jason, I just committed this to trunk, branch-0.23 and 0.23.2.

        Show
        Robert Joseph Evans added a comment - Thanks Jason, I just committed this to trunk, branch-0.23 and 0.23.2.
        Hide
        Robert Joseph Evans added a comment -

        The patch looks good to me +1. I don't like just eating exceptions and dumping them to a log, but I don't see what else to do in this case. The process has exited, and indicates by exiting that it does not want to process any more data, so it looks OK to me.

        Show
        Robert Joseph Evans added a comment - The patch looks good to me +1. I don't like just eating exceptions and dumping them to a log, but I don't see what else to do in this case. The process has exited, and indicates by exiting that it does not want to process any more data, so it looks OK to me.
        Hide
        Jason Lowe added a comment -

        Sorry, I should have clarified when I posted the manual test-patch run. The reported javadoc warnings are unrelated to the patch, as they are specific to these projects and unrelated to anything in hadoop-streaming:

        • hadoop-auth
        • hadoop-common
        • hadoop-rumen (most are here)
        • hadoop-extras
        Show
        Jason Lowe added a comment - Sorry, I should have clarified when I posted the manual test-patch run. The reported javadoc warnings are unrelated to the patch, as they are specific to these projects and unrelated to anything in hadoop-streaming: hadoop-auth hadoop-common hadoop-rumen (most are here) hadoop-extras
        Hide
        Arun C Murthy added a comment -

        Jason, can you pls look at the javadoc warnings?

        Show
        Arun C Murthy added a comment - Jason, can you pls look at the javadoc warnings?
        Hide
        Jason Lowe added a comment -

        test-patch.sh has issues with patches touching hadoop-tools, so I manually test-patch from the root:

        -1 overall.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 18 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        Note that the javadoc warnings are unrelated. I manually ran the additional test case and it passed.

        Show
        Jason Lowe added a comment - test-patch.sh has issues with patches touching hadoop-tools, so I manually test-patch from the root: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 18 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Note that the javadoc warnings are unrelated. I manually ran the additional test case and it passed.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12513202/MAPREDUCE-3790.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1771//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12513202/MAPREDUCE-3790.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1771//console This message is automatically generated.
        Hide
        Jason Lowe added a comment -

        Patch changes mapRedFinished() so we try to wait for the output threads to complete before returning even if there was an IOException trying to flush and close the input stream.

        Added test case to verify functionality of stream.minRecWrittenToEnableSkip_=0 and manually verified the /bin/env test case has been fixed with this patch.

        Show
        Jason Lowe added a comment - Patch changes mapRedFinished() so we try to wait for the output threads to complete before returning even if there was an IOException trying to flush and close the input stream. Added test case to verify functionality of stream.minRecWrittenToEnableSkip_=0 and manually verified the /bin/env test case has been fixed with this patch.
        Hide
        Jason Lowe added a comment -

        Upon closer investigation of the code, there's already a config option, stream.minRecWrittenToEnableSkip_, to specify input errors should be skipped beyond a certain number of records output. This can be set to 0 to ignore any errors on the input such as broken pipe.

        That still leaves the race condition in mapRedFinished() where we can close the DFSOutputStream before the output thread has finished, but there's existing support for allowing streaming jobs to ignore input.

        Show
        Jason Lowe added a comment - Upon closer investigation of the code, there's already a config option, stream.minRecWrittenToEnableSkip_, to specify input errors should be skipped beyond a certain number of records output. This can be set to 0 to ignore any errors on the input such as broken pipe. That still leaves the race condition in mapRedFinished() where we can close the DFSOutputStream before the output thread has finished, but there's existing support for allowing streaming jobs to ignore input.

          People

          • Assignee:
            Jason Lowe
            Reporter:
            Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development