Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3415

Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: fairscheduler
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We encountered this problem while running a spark cluster. The amResourceUsage for a queue became artificially high and then the cluster got deadlocked because the maxAMShare constrain kicked in and no new AM got admitted to the cluster.

      I have described the problem in detail here: https://github.com/apache/spark/pull/5233#issuecomment-87160289

      In summary - the condition for adding the container's memory towards amResourceUsage is fragile. It depends on the number of live containers belonging to the app. We saw that the spark AM went down without explicitly releasing its requested containers and then one of those containers memory was counted towards amResource.

      cc - Sandy Ryza

        Attachments

        1. YARN-3415.002.patch
          12 kB
          zhihai xu
        2. YARN-3415.001.patch
          12 kB
          zhihai xu
        3. YARN-3415.000.patch
          7 kB
          zhihai xu

          Activity

            People

            • Assignee:
              zxu zhihai xu
              Reporter:
              ragarwal Rohit Agarwal
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: