Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2809

Implement workaround for linux kernel panic when removing cgroup

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.7.0
    • Component/s: nodemanager
    • Labels:
      None
    • Environment:

      RHEL 6.4

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster it can result in a couple of panics per day.

      This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267

      Details will be added in comments.

        Attachments

        1. YARN-2809-v3.patch
          11 kB
          Nathan Roberts
        2. YARN-2809-v2.patch
          11 kB
          Nathan Roberts
        3. YARN-2809.patch
          11 kB
          Nathan Roberts

          Activity

            People

            • Assignee:
              nroberts Nathan Roberts
              Reporter:
              nroberts Nathan Roberts
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: