Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8193

YARN RM hangs abruptly (stops allocating resources) when running successive applications.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 3.2.0, 3.1.1, 2.10.1
    • yarn
    • None
    • Reviewed

    Description

      When running massive queries successively, at some point RM just hangs and stops allocating resources. At the point RM get hangs, YARN throw NullPointerException  at RegularContainerAllocator.getLocalityWaitFactor.

      There's sufficient space given to yarn.nodemanager.local-dirs (not a node health issue, RM didn't report any node being unhealthy). There is no fixed trigger for this (query or operation).

      This problem goes away on restarting ResourceManager. No NM restart is required. 

       

       

      Attachments

        1. YARN-8193-branch-2-001.patch
          5 kB
          Jason Darrell Lowe
        2. YARN-8193-branch-2.9.0-001.patch
          5 kB
          Juanjuan Tian
        3. YARN-8193-branch-2.10-001.patch
          5 kB
          Jonathan Hung
        4. YARN-8193.002.patch
          5 kB
          Zian Chen
        5. YARN-8193.001.patch
          5 kB
          Zian Chen

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Zian Chen Zian Chen
            Zian Chen Zian Chen
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment