Details
-
Bug
-
Status: Accepted
-
Critical
-
Resolution: Unresolved
-
1.1.1, 1.1.2
-
None
-
None
Description
A check in sorter.cpp#L355 in 1.1.2 is triggered occasionally in our cluster and crashes the master leader.
I manually modified that check to print out the related variables, and the following is a master log.
https://gist.github.com/zhitaoli/0662d9fe1f6d57de344951c05b536bad#file-gistfile1-txt
From the log, it seems like the check was using an stale value revocable CPU 26 while the new value was updated to 25, thus the check crashed.
So far two verified occurrence of this bug are both observed near an UNRESERVE operation (see lines above in the log).
Attachments
Issue Links
- is related to
-
MESOS-6596 Dynamic reservation endpoint returns 409s
- Open
-
MESOS-4553 Manage offers in allocator.
- Accepted
- relates to
-
MESOS-7639 Oversubscription could crash the master due to CHECK failure in the allocator
- Resolved