Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-79

Ignored IOExceptions from MapOutputLocation.java:getFile lead to hung reduces

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Ignoring IOExceptions during fetching of map outputs in MapOutputLocation.java:getFile (e.g. content-length doesn't match actual data recieved) leads to hung reduces since the MapOutputCopier puts the host in the penalty box and retries forever.

      Possible steps:
      a) Distinguish between failure to fetch output v/s lost maps. (related to HADOOP-1158)
      b) Ensure the reduce doesn't keep fetching from 'lost maps'. (related to HADOOP-1183)
      c) On detection of 'failure to fetch' we probably should have exponential back-offs (versus the same order back-offs as currently) for hosts in the 'penalty box'.
      d) If fetches still fail for say 4 times (after exponential backoffs), we should declare the Reduce as 'failed'.

      This situation could also arise from situations like full-disks on the reducer, whereby it isn't possible to save the map output on the local disk (say for large map outputs).

      Thoughts?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              acmurthy Arun Murthy
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: