Details
Description
While running randomwalk tests with agitation for the 1.5.1 release, I've seen situations where a backup master that is eligible to grab the master lock continues to wait. When this condition arises and the other master restarts, both wait for the lock without success.
I cannot reproduce the problem reliably, and I think more investigation is needed to see what circumstances could be causing the problem.
Diagnosis and Work Around
This failure condition can occur on start up and on backup/active failover of the Master role. If the follow log entry is the final entry on all Master logs you should restart all Master roles, staggering by a few seconds.
[master.Master] INFO : trying to get master lock
If starting a cluster with multiple Master roles, you should stagger Master role starts by a few seconds.
Attachments
Issue Links
- links to