Uploaded image for project: 'ActiveMQ Artemis'
  1. ActiveMQ Artemis
  2. ARTEMIS-3430

Activation Sequence Auto-Repair

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.19.0
    • None
    • None

    Description

      This can be seen both as a bug or an improvement over the existing self-heal behaviour of activation sequence introduced by https://issues.apache.org/jira/browse/ARTEMIS-3340.

      In short, the existing protocol to increase activation sequence while un-replicated is:

      1. remote i -> -(i + 1) ie remote CLAIM
      2. local i -> (i + 1) ie local commit
      3. remote -(i + 1) -> (i + 1) ie remote COMMIT

      This protocol has been designed to allow witness brokers to acknowledge if their data is no longer up-to-date and to save them to throw it away if still valuable, during a partial failure while increasing activation sequence.

      In the current version, self-repairing is allowed just if live broker has performed 2. but not 3. ie local activation sequence is updated, but coordinated one isn't committed yet.
      If the failing broker is restarted it can "fix" the coordinated sequence and move on to become live again, but if 2. fail (or just never happen), the coordinated activation sequence cannot be fixed if not with some admin intervention, after inspecting all local activation sequences.

      The reason why other brokers cannot "fix" the sequence is because the local sequence of the failed broker is unknown and just roll-backing the claimed one (to the previous or to the right committed value) can makes the failed broker to believe to have up-to-date data too, causing journal misalignments.

      The solution to this can be to fix the claimed sequence moving it to the right commit value while forbidding other brokers to run un-replicated using it.
      This is achieved by further increasing it after repaired: it would prematurely age other in-sync brokers (including the failed one), but allowing auto-repair without admin intervention.
      The sole drawback of this strategy is that a further fail of the repairing broker while increasing sequence will give to it an exclusive responsibility to auto-repair (again, on restart) because no other brokers can have an high-enough local sequence.

      Attachments

        Activity

          People

            nigrofranz Francesco Nigro
            nigrofranz Francesco Nigro
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: