For large clusters, the 0.23.0 allocator cannot keep up with the volume of slaves. After the following slave was re-registered, it took the allocator a long time to work through the backlog of slaves to add:
Some timings from a production cluster reveal that the allocator spending in the low tens of milliseconds for each call to addSlave and updateSlave, when there are tens of thousands of slaves this amounts to the large delay seen above.
We also saw a slow steady increase in memory consumption, hinting further at a queue backup in the allocator.
A synthetic benchmark like we did for the registrar would be prudent here, along with visibility into the allocator's queue size.