Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10112

Livelock (Runnable FairScheduler.getAppWeight) in Resource Manager when used with Fair Scheduler size based weights enabled

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.8.5
    • Fix Version/s: 3.0.0
    • Component/s: fairscheduler
    • Labels:
      None

      Description

      The user uses the FairScheduler, and yarn.scheduler.fair.sizebasedweight is set true. From the ticket JStack thread dump from the support engineers, we could see that the method getAppWeight below in the class of FairScheduler was occupying the FairScheduler object monitor always, which made org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate always await of entering the same object monitor, thus resulting in the the livelock.

       

      The issue occurs very infrequently and we are still unable to figure out a way to consistently reproduce the issue. The issue resembles to what the Jira YARN-1458 reports, but it seems that code fix has taken into effect since 2.6. 

       

       

      "ResourceManager Event Processor" #17 prio=5 os_prio=0 tid=0x00007fbcee65e800 nid=0x2ea4 waiting for monitor entry [0x00007fbcbcd5e000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:1105) - waiting to lock <0x00000006eb816b18> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1362) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:129) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:801) at java.lang.Thread.run(Thread.java:748) 
      
      
      
      
      
      "FairSchedulerUpdateThread" #23 daemon prio=5 os_prio=0 tid=0x00007fbceea0e800 nid=0x2ea2 runnable [0x00007fbcbcf60000] java.lang.Thread.State: RUNNABLE at java.lang.StrictMath.log1p(Native Method) at java.lang.Math.log1p(Math.java:1747) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:570) - locked <0x00000006eb816b18> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getWeights(FSAppAttempt.java:953) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:192) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:180) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSharesInternal(ComputeFairShares.java:140) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:51) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:138) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:235) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:89) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:365) - locked <0x00000006eb816b18> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:314)

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              wilfreds Wilfred Spiegelenburg
              Reporter:
              aywanga Yu Wang

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment