Details
Description
In the agent we have the following retry backoff calculation logic:
Duration backoff =
flags.authentication_backoff_factor * std::pow(2, failedAuthentications);
Since the `Duration` uses `int64_t` to hold nanosecond, if we set the `authentication_backoff_factor` to 1 second, we will overflow after 34 failed authentications (from second to nanosecond we lose 30 bits and 2^34 in the `pow()`).
The effect is we do not backoff at all, we will just retry immediately after the 5s timeout:
https://github.com/apache/mesos/blob/874c752316b14055c0a5a7b67f97ccf912abcc3c/src/master/master.cpp#L9615-L9619
The scheduler driver also has the same issue.
We should also audit all the other backoff logic.
Attachments
Issue Links
- is related to
-
MESOS-9144 Master authentication handling leads to request amplification.
- Resolved
-
MESOS-9145 Master has a fragile burned-in 5s authentication timeout.
- Resolved
- relates to
-
MESOS-9146 Agent has a fragile burn-in 5s authentication timeout.
- Resolved