Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7636

Re-reservation count may overflow when cluster resource exhausted for a long time

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0, 2.9.1
    • Fix Version/s: 3.1.0, 2.9.1, 3.0.3
    • Component/s: capacityscheduler
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This happens on our production cluster twice, when a request cannot be satisfied for a long time, it continually triggers the re-reservation and eventually caused the overflow. This will crash the scheduler.

      Exception stack:

      java.lang.IllegalArgumentException: Overflow adding 1 occurrences to a count of 2147483647
              at com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:246)
              at com.google.common.collect.AbstractMultiset.add(AbstractMultiset.java:80)
              at com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:51)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addReReservation(SchedulerApplicationAttempt.java:406)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.reserve(SchedulerApplicationAttempt.java:555)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1076)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
      

      Refer to handling of SchedulerApplicationAttempt#addSchedulingOpportunity, we can ignore this exception to avoid this problem.

      This problem may happens in SchedulerApplicationAttempt#addMissedNonPartitionedRequestSchedulingOpportunity, fix it in the same way.

        Attachments

        1. YARN-7636.003.patch
          2 kB
          Tao Yang
        2. YARN-7636.002.patch
          2 kB
          Tao Yang
        3. YARN-7636.001.patch
          1 kB
          Tao Yang

          Activity

            People

            • Assignee:
              Tao Yang Tao Yang
              Reporter:
              Tao Yang Tao Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: