Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4560

Job can get stuck in a deadlock between mappers and reducers for low values of mapreduce.job.reduce.slowstart.completedmaps (<<1)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      This issue has been seen with MapReduceV2, never with MapReduceV1 in our lab systems.

      The parameter mapreduce.job.reduce.slowstart.completedmaps=0.05 (the default value).

      We found Application master stuck in a deadlock between mappers and reducers with no progress in the job; the sequence appears to be:

      1. Initial available map/reduce slots were allocated to mappers
      2. Once mappers made progress and few of them completed, reducers started occupying few of the slots due to low values of above config param.
      3. The scheduler appears to not give priority to mappers over reducers; after a while in our system we saw all slots occupied by reducers.
      4. Since there were still mapper tasks not yet assigned any slot, the map phase never completed.
      5. The system entered a deadlock state where reducers occupy all available slots, but are waiting for mappers to be complete; mappers cannot move forward because of no slot available.

      The workaround in our system was to set
      mapreduce.job.reduce.slowstart.completedmaps=1 and the issue was no longer seen.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rjain7 Rahul Jain
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: