Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2809

Implement workaround for linux kernel panic when removing cgroup

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.6.0
    • 2.7.0
    • nodemanager
    • None
    • RHEL 6.4

    • Reviewed

    Description

      Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster it can result in a couple of panics per day.

      This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267

      Details will be added in comments.

      Attachments

        1. YARN-2809-v3.patch
          11 kB
          Nathan Roberts
        2. YARN-2809-v2.patch
          11 kB
          Nathan Roberts
        3. YARN-2809.patch
          11 kB
          Nathan Roberts

        Activity

          People

            nroberts Nathan Roberts
            nroberts Nathan Roberts
            Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: