A new failure detector (FD) implementation was merged in commit 21b0f3d and is part of Kudu 1.5. One of the key changes is that the detection logic runs on a reactor thread rather than on a dedicated per-replica thread. But, because reactor threads are shared, the election started in the event of a failure must be thunked to the Raft thread pool (starting an election means casting a vote, which generally means performing IO, which is verboten on a reactor thread).
By thunking, the FD immediately rearms; the previous implementation did not do this. If there's a lot of outstanding IO (i.e. during an election storm across thousands of tablets), it's possible for the FD to fire again while the first election task is still waiting to cast its vote. The new election task will try to acquire the consensus lock and block on it (it's held by the first election task). And so on. When the original IO finally completes, all of the follow-on elections will get unblocked at the same time.