Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-2422

Backup master can miss acquiring lock when primary exits

    XMLWordPrintableJSON

    Details

      Description

      While running randomwalk tests with agitation for the 1.5.1 release, I've seen situations where a backup master that is eligible to grab the master lock continues to wait. When this condition arises and the other master restarts, both wait for the lock without success.

      I cannot reproduce the problem reliably, and I think more investigation is needed to see what circumstances could be causing the problem.

      Diagnosis and Work Around

      This failure condition can occur on start up and on backup/active failover of the Master role. If the follow log entry is the final entry on all Master logs you should restart all Master roles, staggering by a few seconds.

      [master.Master] INFO : trying to get master lock
      

      If starting a cluster with multiple Master roles, you should stagger Master role starts by a few seconds.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bhavanki Bill Havanki
                Reporter:
                bhavanki Bill Havanki
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: