The good thing about this failure is that in all instances I've seen, we always have an overseer. It's just that the overseer is not one of the designates. I looked at the logs of a few failures and it seemed like the re-prioritization was in process and we timed out early.
Here's a patch to harden the process. We have a max timeout of 300 seconds and a smaller 60 second timeout for finding designates which is adjusted further and further ahead as we find new overseers being elected. The idea is that if within 60 seconds, the overseer hasn't changed, then we're likely not going to find a new overseer and we should stop. But if the overseer changed then re-prioritization is in progress and we should wait more.