Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5928

Deadlock allocating containers for mappers and reducers

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Hadoop 2.4.0 (as packaged by HortonWorks in HDP 2.1.2)

      Description

      I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers).
      Due to the small memory of these systems I configured yarn as follows:

      yarn.nodemanager.resource.memory-mb = 2200
      yarn.scheduler.minimum-allocation-mb = 250

      On my client I did

      mapreduce.map.memory.mb = 512
      mapreduce.reduce.memory.mb = 512

      Now I run a job with 27 mappers and 32 reducers.
      After a while I saw this deadlock occur:

      • All nodes had been filled to their maximum capacity with reducers.
      • 1 Mapper was waiting for a container slot to start in.

      I tried killing reducer attempts but that didn't help (new reducer attempts simply took the existing container).

      Workaround:
      I set this value from my job. The default value is 0.05 (= 5%)

      mapreduce.job.reduce.slowstart.completedmaps = 0.99f

        Attachments

        1. AM-MR-syslog - Cleaned.txt.gz
          420 kB
          Niels Basjes
        2. Cluster fully loaded.png.jpg
          141 kB
          Niels Basjes
        3. MR job stuck in deadlock.png.jpg
          62 kB
          Niels Basjes

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                nielsbasjes Niels Basjes
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: