Currently discovery (base/oak/impl) use the sling scheduler for scheduling the background jobs that periodically issue heartbeats, ping the topology connectors and check if the view is current. It's very important that these jobs run at the exact defined periods - delays of a few minutes can break their usefullness and in the end cause an instability in the topology. Sling scheduler uses a thread-pool which is by definition limited. And if, for some reason, this pool is busy doing other stuff, then the job is not executed for a certain amount of time. Consider the situation when all the jobs are busy with other things (non discovery stuff) when discovery wants to store a heartbeat or monitor the view - that's then not possible and gets delayed. If the delay is big enough to let the heartbeats time out (or changes to get unnoticed), then the topology can break.
Thus to avoid this, instead of relying on the size-bound-scheduler, use a dedicated thread for these high priority tasks.