[MESOS-9889] Master CPU high due to unexpected foreachkey behaviour in Master::__reregisterSlave. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.6.3, 1.7.3, 1.8.2, 1.9.1, 1.10.0
Component/s: None
Labels:
- foundations

Target Version/s:

1.6.3, 1.7.3, 1.8.2, 1.9.1

Description

At https://github.com/apache/mesos/blob/9932550e9632e7fbb9a45b217793c7f508f57001/src/master/master.cpp#L7707-L7708

void Master::__reregisterSlave(
...
    foreachkey (FrameworkID frameworkId,
               slaves.unreachableTasks.at(slaveInfo.id())) {
        ...
        foreach (TaskID taskId,
                 slaves.unreachableTasks.at(slaveInfo.id()).get(frameworkId)) {

Our case is when network flapping, 3~4 agents reregister, then master would CPU full and could not process any requests during that period.

After change

-    foreachkey (FrameworkID frameworkId,
-               slaves.unreachableTasks.at(slaveInfo.id())) {
+    foreach (FrameworkID frameworkId,
+               slaves.unreachableTasks.at(slaveInfo.id()).keys()) {

The problem gone.

Attachments

Issue Links

is related to

MESOS-5037 foreachkey behaviour is not expected in multimap

Open

Activity

People

Assignee:: Benjamin Mahler

Reporter:: haosdent

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Jul/19 18:05

Updated:: 04/Oct/19 19:24

Resolved:: 04/Oct/19 17:47