Details
Description
In `Master::authenticate(const UPID& from, const UPID& pid)`, we have:
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L9594
if (authenticating.contains(pid)) { LOG(INFO) << "Queuing up authentication request from " << pid << " because authentication is still in progress"; // Try to cancel the in progress authentication by discarding the // future. authenticating[pid].discard(); // Retry after the current authenticator session finishes. authenticating[pid] .onAny(defer(self(), &Self::authenticate, from, pid)); return; }
Let's say the master is processing authentication request R1 (whose associating future is held in `authenticating`). Now it receives another request R2 from the same client (due to e.g. re-try), according to the above code logic, we will (1) discard R1; (2) enqueue R2 as a callback which will be triggered when R1 is discarded, and we will redo `:: authenticate` with R2.
Here the master assumes that R2 is the most current request. This is true in the above example. However, this assumption could easily break when auth requests come faster than they are discarded. If we have 3 requests (R1, R2, R3) in the event queue, then we could trigger `::authenticate` SIX times in total, once for R1, twice for R2 and three times for R3. This grows in quadratic to the number of enqueued requests and the master will be overwhelmed.
This issue couples with MESOS-9146 and MESOS-9145 makes the master authentication fragile and can easily be overwhlemed.
Attachments
Attachments
Issue Links
- relates to
-
MESOS-9145 Master has a fragile burned-in 5s authentication timeout.
- Resolved
-
MESOS-9146 Agent has a fragile burn-in 5s authentication timeout.
- Resolved
-
MESOS-9147 Agent and scheduler driver authentication retry backoff time could overflow.
- Resolved