[YARN-7636] Re-reservation count may overflow when cluster resource exhausted for a long time - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0, 2.9.1
Fix Version/s: 3.1.0, 2.9.1, 3.0.3
Component/s: capacityscheduler
Labels:
None

Hadoop Flags:

Reviewed

Description

This happens on our production cluster twice, when a request cannot be satisfied for a long time, it continually triggers the re-reservation and eventually caused the overflow. This will crash the scheduler.

Exception stack:

java.lang.IllegalArgumentException: Overflow adding 1 occurrences to a count of 2147483647
        at com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:246)
        at com.google.common.collect.AbstractMultiset.add(AbstractMultiset.java:80)
        at com.google.common.collect.ConcurrentHashMultiset.add(ConcurrentHashMultiset.java:51)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.addReReservation(SchedulerApplicationAttempt.java:406)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.reserve(SchedulerApplicationAttempt.java:555)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1076)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)

Refer to handling of SchedulerApplicationAttempt#addSchedulingOpportunity, we can ignore this exception to avoid this problem.

This problem may happens in SchedulerApplicationAttempt#addMissedNonPartitionedRequestSchedulingOpportunity, fix it in the same way.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-7636.003.patch
15/Mar/18 03:14
2 kB
Tao Yang
YARN-7636.002.patch
14/Mar/18 09:16
2 kB
Tao Yang
YARN-7636.001.patch
11/Dec/17 11:23
1 kB
Tao Yang

Activity

People

Assignee:: Tao Yang

Reporter:: Tao Yang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Dec/17 11:22

Updated:: 05/Apr/18 20:44

Resolved:: 16/Mar/18 11:10