Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8513

CapacityScheduler infinite loop when queue is near fully utilized

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.1.0, 2.9.1
    • None
    • capacity scheduler, yarn
    • None
    • Ubuntu 14.04.5 and 16.04.4

      YARN is configured with one label and 5 queues.

    Description

      ResourceManager does not respond to any request when queue is near fully utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM restart, it can recover running jobs and start accepting new ones.

       

      Seems like CapacityScheduler is in an infinite loop printing out the following log messages (more than 25,000 lines in a second):

       

      2018-07-10 17:16:29,227 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.99816763 absoluteUsedCapacity=0.99816763 used=<memory:16170624, vCores:1577> cluster=<memory:29441544, vCores:5792>
      2018-07-10 17:16:29,227 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal
      2018-07-10 17:16:29,227 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1530619767030_1652_000001 container=null queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 clusterResource=<memory:29441544, vCores:5792> type=NODE_LOCAL requestedPartition=

       

      I encounter this problem several times after upgrading to YARN 2.9.1, while the same configuration works fine under version 2.7.3.

       

      YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a similar problem.

       

      Attachments

        1. jstack-1.log
          173 kB
          Chen Yufei
        2. jstack-2.log
          171 kB
          Chen Yufei
        3. jstack-3.log
          169 kB
          Chen Yufei
        4. jstack-4.log
          170 kB
          Chen Yufei
        5. jstack-5.log
          173 kB
          Chen Yufei
        6. top-during-lock.log
          2 kB
          Chen Yufei
        7. top-when-normal.log
          2 kB
          Chen Yufei
        8. yarn3-jstack1.log
          151 kB
          Chen Yufei
        9. yarn3-jstack2.log
          151 kB
          Chen Yufei
        10. yarn3-jstack3.log
          223 kB
          Chen Yufei
        11. yarn3-jstack4.log
          152 kB
          Chen Yufei
        12. yarn3-jstack5.log
          153 kB
          Chen Yufei
        13. yarn3-resourcemanager.log
          798 kB
          Chen Yufei
        14. yarn3-top
          9 kB
          Chen Yufei

        Issue Links

          Activity

            People

              Unassigned Unassigned
              cyfdecyf Chen Yufei
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: