[QPID-4082] cluster de-sync after broker restart & queue replication - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.16
Fix Version/s: 0.17
Component/s: C++ Clustering
Labels:
- patch

Description

Description of problem:
Having queue state replication between 2 clusters, restarting a broker in both source+destination clusters sometimes leads to cluster de-sync. No QMF communication is involved, though symptoms are similar to the bug caused by missing propagation of QMF errors within a cluster.

Version-Release number of selected component (if applicable):
spotted in qpid 0.14, expected also in 0.16

How reproducible:
100% within 10 minutes.

Steps to Reproduce:
1. Have 2node src. cluster and 2node dst cluster (see reproducer for example config and also for a reproducer script for further steps).
2. Have a queue state replication between the clusters.
3. Randomly stop or start a broker in a cluster (such that everytime both clusters have at least 1 node running - i.e. stop+start only non-elder brokers)
4. After each stop or start, send 1 message to the src.broker to a queue to be replicated.
5. Wait some time

Actual results:
The started-up broker in src.cluster may shutdown after logging:
2012-05-31 11:58:40 critical cluster(10.34.1.218:26715 READY/error) local error 502 did not occur on member 10.34.1.218:26294: invalid-argument: anonymous.b941dd87-3fa1-442d-99f7-8c0907599b30: confirmed < (24+0) but only sent < (23+0) (qpid/SessionState.cpp:154)

Expected results:
No such error

Additional info:

the affected session is always federation route for the queue state replication
the stop and start of both one src and one dst broker is essential in the scenario, e.g. without (re)starting a dst.broker, no error.
sometimes almost deterministic scenario is:
1) start everything, send a message
2) stop a dst.broker, send a message
3) stop a src.broker, send a message
4) start src.broker, then dst.broker
5) wait some time (i.e. 10 seconds) and send a message
Sometimes I got instantly the error, sometimes never.

Patch to be proposed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

QPID-4082.patch
21/Jun/12 07:02
4 kB
Pavel Moravec

Activity

People

Assignee:: Alan Conway

Reporter:: Pavel Moravec

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Jun/12 06:42

Updated:: 29/Jul/13 18:53

Resolved:: 21/Jun/12 16:08