Qpid
  1. Qpid
  2. QPID-4082

cluster de-sync after broker restart & queue replication

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.16
    • Fix Version/s: 0.17
    • Component/s: C++ Clustering
    • Labels:

      Description

      Description of problem:
      Having queue state replication between 2 clusters, restarting a broker in both source+destination clusters sometimes leads to cluster de-sync. No QMF communication is involved, though symptoms are similar to the bug caused by missing propagation of QMF errors within a cluster.

      Version-Release number of selected component (if applicable):
      spotted in qpid 0.14, expected also in 0.16

      How reproducible:
      100% within 10 minutes.

      Steps to Reproduce:
      1. Have 2node src. cluster and 2node dst cluster (see reproducer for example config and also for a reproducer script for further steps).
      2. Have a queue state replication between the clusters.
      3. Randomly stop or start a broker in a cluster (such that everytime both clusters have at least 1 node running - i.e. stop+start only non-elder brokers)
      4. After each stop or start, send 1 message to the src.broker to a queue to be replicated.
      5. Wait some time

      Actual results:
      The started-up broker in src.cluster may shutdown after logging:
      2012-05-31 11:58:40 critical cluster(10.34.1.218:26715 READY/error) local error 502 did not occur on member 10.34.1.218:26294: invalid-argument: anonymous.b941dd87-3fa1-442d-99f7-8c0907599b30: confirmed < (24+0) but only sent < (23+0) (qpid/SessionState.cpp:154)

      Expected results:
      No such error

      Additional info:

      • the affected session is always federation route for the queue state replication
      • the stop and start of both one src and one dst broker is essential in the scenario, e.g. without (re)starting a dst.broker, no error.
      • sometimes almost deterministic scenario is:
        1) start everything, send a message
        2) stop a dst.broker, send a message
        3) stop a src.broker, send a message
        4) start src.broker, then dst.broker
        5) wait some time (i.e. 10 seconds) and send a message
        Sometimes I got instantly the error, sometimes never.

      Patch to be proposed.

      1. QPID-4082.patch
        4 kB
        Pavel Moravec

        Activity

          People

          • Assignee:
            Alan Conway
            Reporter:
            Pavel Moravec
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development