Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8193

YARN RM hangs abruptly (stops allocating resources) when running successive applications.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 3.2.0, 3.1.1, 2.10.1
    • yarn
    • None
    • Reviewed

    Description

      When running massive queries successively, at some point RM just hangs and stops allocating resources. At the point RM get hangs, YARN throw NullPointerException  at RegularContainerAllocator.getLocalityWaitFactor.

      There's sufficient space given to yarn.nodemanager.local-dirs (not a node health issue, RM didn't report any node being unhealthy). There is no fixed trigger for this (query or operation).

      This problem goes away on restarting ResourceManager. No NM restart is required. 

       

       

      Attachments

        1. YARN-8193.001.patch
          5 kB
          Zian Chen
        2. YARN-8193.002.patch
          5 kB
          Zian Chen
        3. YARN-8193-branch-2.10-001.patch
          5 kB
          Jonathan Hung
        4. YARN-8193-branch-2.9.0-001.patch
          5 kB
          Juanjuan Tian
        5. YARN-8193-branch-2-001.patch
          5 kB
          Jason Darrell Lowe

        Issue Links

          Activity

            People

              Zian Chen Zian Chen
              Zian Chen Zian Chen
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: