Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1458

FairScheduler: Zero weight can lead to livelock

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.6.0
    • Component/s: scheduler
    • Labels:
      None
    • Environment:

      Centos 2.6.18-238.19.1.el5 X86_64
      hadoop2.2.0

    • Target Version/s:

      Description

      The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when clients submit lots jobs, it is not easy to reapear. We run the test cluster for days to reapear it. The output of jstack command on resourcemanager pid:

       "ResourceManager Event Processor" prio=10 tid=0x00002aaab0c5f000 nid=0x5dd3 waiting for monitor entry [0x0000000043aa9000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
              - waiting to lock <0x000000070026b6e0> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
              at java.lang.Thread.run(Thread.java:744)
      ……
      "FairSchedulerUpdateThread" daemon prio=10 tid=0x00002aaab0a2c800 nid=0x5dc8 runnable [0x00000000433a2000]
         java.lang.Thread.State: RUNNABLE
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
              - locked <0x000000070026b6e0> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
              - locked <0x000000070026b6e0> (a org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
              at java.lang.Thread.run(Thread.java:744)
      

        Attachments

        1. YARN-1458.addendum.patch
          1 kB
          Zhihai Xu
        2. yarn-1458-8.patch
          11 kB
          Zhihai Xu
        3. yarn-1458-7.patch
          12 kB
          Karthik Kambatla
        4. YARN-1458.006.patch
          14 kB
          Zhihai Xu
        5. YARN-1458.alternative2.patch
          11 kB
          Zhihai Xu
        6. yarn-1458-5.patch
          9 kB
          Karthik Kambatla
        7. YARN-1458.alternative1.patch
          9 kB
          Zhihai Xu
        8. YARN-1458.alternative0.patch
          6 kB
          Zhihai Xu
        9. YARN-1458.004.patch
          9 kB
          Zhihai Xu
        10. YARN-1458.003.patch
          7 kB
          Zhihai Xu
        11. YARN-1458.002.patch
          7 kB
          Zhihai Xu
        12. YARN-1458.001.patch
          5 kB
          Zhihai Xu
        13. YARN-1458.patch
          1 kB
          qingwu.fu

          Issue Links

            Activity

              People

              • Assignee:
                zxu Zhihai Xu
                Reporter:
                qingwu.fu qingwu.fu
              • Votes:
                3 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 408h
                  408h
                  Remaining:
                  Remaining Estimate - 408h
                  408h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified