Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-4163

If a reducer failed at shuffling stage, the task should fail, not just logging an exception

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.17.1
    • 0.19.0
    • None
    • None
    • Reviewed

    Description

      I saw a reducer stuck at the shuffling stage, with the following exception logged in the log file:

      2008-08-30 00:16:23,265 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
      at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
      at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
      at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59)
      at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79)
      at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:332)
      at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59)
      at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79)
      at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:185)
      at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:815)
      at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:764)
      Caused by: java.io.IOException: No space left on device
      at java.io.FileOutputStream.writeBytes(Native Method)
      at java.io.FileOutputStream.write(FileOutputStream.java:260)
      at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:197)
      ... 11 more

      2008-08-30 00:16:23,320 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
      java.io.IOException: task_200808291851_0001_r_000023_0The reduce copier failed
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:329)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

      The task should have died.

      Attachments

        1. 4163_v1.patch
          3 kB
          Sharad Agarwal
        2. 4163_v2.patch
          2 kB
          Sharad Agarwal
        3. 4163_v3.patch
          2 kB
          Sharad Agarwal

        Issue Links

          Activity

            People

              sharadag Sharad Agarwal
              runping Runping Qi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: