Uploaded image for project: 'ActiveMQ Artemis'
  1. ActiveMQ Artemis
  2. ARTEMIS-2713

Master failback can trigger a useless quorum vote on slave failover

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.11.0
    • 2.12.0
    • Broker
    • None

    Description

      A shared nothing replicated master-slave pair using check-for-live-server on master and allow-failback on slave can trigger a (single or several) useless quorum vote during master restart.
      The issue can happen depending on the timing by which some messages are exchanged between the pair: the slave restarting as a backup perform these operations:

      1. async send STOP_CALLED on the connection with master used to send the replica files (ie let's call it replication connection)
      2. close all the connections with master, but the replication connection (sending a DISCONNECT to the closing ones)
      3. async send FAIL_OVER on the replication connection (waiting 5 seconds before giving up and move on)
      4. close the replication connection

      The master could receive the DISCONNECT before STOP_CALLED (because are different connections!) believing that the slave isn't going down intentionally: this will make it to fire vote-retries quorum vote.
      Such quorum vote (in the happy path) should "quickly" complete positively, making master able to fail-over anyway, because the slave is already moved on and (ideally) the other brokers have "enough time" to update their topologies too.

      Although performing an additional quorum vote isn't a bad thing per-se, it could create an unnecessary long time window to await the observing cluster to update their topologies, slowing down an operation that is supposed instead to be completed quickly (on the happy path).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nigro.fra@gmail.com Francesco Nigro
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h