Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-2220

Assisting manual recovery from a complete persistent cluster crash.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.5
    • None
    • C++ Broker
    • None

    Description

      If every member of a persistent cluster crashes then manual intervention is required to identify which store is most up-to-date, so it can be used to recover. We need to provide tools to assist in this identification.

      The cluster can save a config-change counter with each config change (cluster membership change). In recovery, the broker with the highest config-change counter has the best store.

      However if the last brokers in the cluster crash so close together that none can record a config-change we need an additional decider.
      The store at http://qpidcomponents.org/download.html#persistence maintains a global Persistence ID, a 64 bit value that is incremented for each enqueue, dequeue. If the cluster stores (config-change,PID) pairs then in recovery we can use actual-PID - config-change PID as a tiebreaker.

      Proposed change to MessageStore API:
      /** Returns a monotonically increasing value reflecting changes to the store.

      • The value can wrap-around to 0.
      • Stores need not implement this function, they can simply return 0.
        */
        uint64_t getChangeCounter();

      The default implementation just returns 0 and the cluster must fall back to relying on config-change counts.

      Attachments

        Activity

          People

            aconway Alan Conway
            aconway Alan Conway
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: