Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4304

Deadlock where all containers are held by ApplicationMasters should be prevented

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.1
    • Fix Version/s: None
    • Component/s: mrv2, resourcemanager
    • Labels:
      None

      Description

      In my test cluster with 4 NodeManagers, each with only ~1.6G container memory, when a burst of jobs, e.g. >10, are concurrently submitted, it is likely that 4 jobs are accepted, with 4 ApplicationMasters allocated, but then the jobs block each other indefinitely because they're all waiting to allocate more containers.

      Note that the problem is not limited to tiny cluster like this. As long as the number of jobs being submitted is greater than the rate jobs finish, it may run into a vicious cycle where more and more containers are locked up by ApplicationMasters.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              herman@cloudera.com Herman Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated: