Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1198 Capacity Scheduler headroom calculation does not work as expected
  3. YARN-1680

availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.2.0, 2.3.0
    • Fix Version/s: None
    • Component/s: capacityscheduler
    • Labels:
      None
    • Environment:

      SuSE 11 SP2 + Hadoop-2.3

      Description

      There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1.

      Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now.

      MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory).

        Attachments

        1. YARN-1680.patch
          18 kB
          Chen He
        2. YARN-1680-v2.patch
          20 kB
          Chen He
        3. YARN-1680-v2.patch
          20 kB
          Chen He
        4. YARN-1680-WIP.patch
          7 kB
          Chen He

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rohithsharma Rohith Sharma K S
              • Votes:
                1 Vote for this issue
                Watchers:
                28 Start watching this issue

                Dates

                • Created:
                  Updated: