Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.14, 0.16
-
CentOS release 5.5 (Final)
Linux version 2.6.18-194.32.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Wed Jan 5 17:52:25 EST 2011
Description
There is a very strange problem that causes ordinary C++ clients to break C++ broker cluster authentication.
I have two brokers, let's call them gateway03 and gateway04. They are both configured to cluster using SSL and CRAM-MD5 (see their configuration files / environment below).
Scenario 1
This scenario works fine. I'm simply reporting it to show that it does work.
- I start the broker on gateway04.
- I start the broker on gateway03.
- gateway03 discovers the cluster.
- gateway03 creates a new persistent store.
- gateway03 receives updates from gateway04
- gateway03 caught-up
- gateway04 marks store as dirty (no longer last man standing).
- Start as many of my C++ clients as I want, and they all connect with no issues (note, I've included a sample C++ client that I used to test below, but the same happens with a number of fully-fledged C++ client applications I've written too).
This is all perfect!
Scenario 2
This is the one that reveals a problem
- I start the broker on gateway04.
- I don't start the broker on gateway03 yet.
- I start one or more C++ clients on gateway04 (or gateway03 telling the client to connect to gateway04) (see sample C++ client code below).
- C++ clients all connect correctly, with no issues.
- I start the broker on gateway03.
- gateway03 discovers the cluster.
- gateway03 creates a new persistent store.
- Up to this point the results of the two scenarios is the same, just the order is different.
- The broker on gateway03 reports "warning Client closed connection with 501: Desired mechanism(s) not valid: CRAM-MD5 (supported: ANONYMOUS PLAIN)" then "critical cluster(192.168.9.90:24784 UPDATEE) catch-up connection closed prematurely".
- At the same time gateway04 reports "warning Closing connection due to Desired mechanism(s) not valid: CRAM-MD5 (supported: ANONYMOUS PLAIN)}" then "error sending update: Desired mechanism(s) not valid: CRAM-MD5 (supported: ANONYMOUS PLAIN)"
- Then, of course, the broker on gateway03 shuts down.
Note, more complete broker / client logs are included below.
So, it seems that somehow once an ordinary C++ client (ie not another broker) connects to the broker, the broker no longer accepts CRAM-MD5 authentication from other joining brokers. But already-joined brokers participating in the cluster are not affected.
This may be the result of some misconfiguration on my part, but looks like a pretty serious clustering bug to me... ?
Some further notes:
- I don't know at this stage if the use of SSL has any affect (ie I haven't tried these same tests with SSL turned off).
- it doesn't matter which broker is started first - I can swap gateway03 and gateway04 in every instance, and the result is the same.
- I have a third broker that I use as part of this cluster, and it sees that same affect. If, for example, I have two brokers clustered, and a client connects, then the third broker will not be able to join the cluster for the same reason as above.
Broker Configuration
The gateway03 broker is configured as follows (gateway04 config is the same, but with 03 replaced with 04 where relevant).
cluster-name="mm-queue-cluster"
cluster-mechanism=CRAM-MD5
cluster-username=guest
cluster-password=guest
cluster-size=0
cluster-url=ssl:gateway03:5671
auth=yes
ssl-cert-db=/etc/qpid/certs/broker
ssl-cert-password-file=/etc/qpid/certs/pass.txt
ssl-cert-name=broker
require-encryption=yes
pwcheck_method: auxprop
auxprop_plugin: sasldb
sasldb_path: /var/lib/qpidd/qpidd.sasldb
mech_list: CRAM-MD5 DIGEST-MD5 PLAIN
sql_select: dummy select
export QPID_SSL_CERT_DB=/etc/qpid/certs/broker export QPID_SSL_CERT_NAME=broker export QPID_SSL_CERT_PASSWORD_FILE=/etc/qpid/certs/pass.txt
Sample Test Client
#include <iostream> #include <qpid/messaging/Connection.h> int main(int argc, char *argv[]) { qpid::types::Variant::Map options; options["username" ] = "guest"; options["password" ] = "guest"; qpid::messaging::Connection connection("ssl:gateway04:5671", options); connection.open(); std::cout << "sleeping..." << std::endl; sleep(300); return 0; }
export QPID_SSL_CERT_DB=/etc/qpid/certs/client export QPID_SSL_CERT_NAME=client export QPID_SSL_CERT_PASSWORD_FILE=/etc/qpid/certs/pass.txt
Scenario 1 Logs
This is the one where everything works as expected.
Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice Journal "TplStore": Created Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice Store module initialized; store-dir=/var/lib/qpidd Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice Initializing CPG Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice Cluster store state: clean cluster-id=63f8f108-5a60-4efe-8506-27385006c5fc shutdown-id=969b7c98-b4f6-4a01-93f4-962f3ab5f77d Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice cluster(192.168.9.92:6749 PRE_INIT) configuration change: 192.168.9.92:6749 Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice cluster(192.168.9.92:6749 PRE_INIT) Members joined: 192.168.9.92:6749 Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice Journal "...": Created (queues being loading from persistent store...) Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice Journal "...": Created Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice Listening on TCP/TCP6 port 5672 Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice Listening for SSL connections on TCP port 5671 Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice cluster(192.168.9.92:6749 INIT) cluster-uuid = 63f8f108-5a60-4efe-8506-27385006c5fc Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice cluster(192.168.9.92:6749 READY) joined cluster "mm-queue-cluster" Feb 16 12:58:03 gateway04 qpidd[6749]: 2012-02-16 12:58:03 notice Broker running (broker on gateway03 started here...) Feb 16 12:58:21 gateway04 qpidd[6749]: 2012-02-16 12:58:21 notice cluster(192.168.9.92:6749 READY) configuration change: 192.168.9.90:28152 192.168.9.92:6749 Feb 16 12:58:21 gateway04 qpidd[6749]: 2012-02-16 12:58:21 notice cluster(192.168.9.92:6749 READY) Members joined: 192.168.9.90:28152 Feb 16 12:58:21 gateway04 qpidd[6749]: 2012-02-16 12:58:21 notice cluster(192.168.9.92:6749 UPDATER) sending update to 192.168.9.90:28152 at amqp:ssl:gateway03:5671 Feb 16 12:58:50 gateway04 qpidd[6749]: 2012-02-16 12:58:50 warning Broker closed connection: 200, OK Feb 16 12:58:50 gateway04 qpidd[6749]: 2012-02-16 12:58:50 notice cluster(192.168.9.92:6749 UPDATER) update sent Feb 16 12:58:50 gateway04 qpidd[6749]: 2012-02-16 12:58:50 notice Running in a cluster, marking store dirty. (client connected here...)
(broker on gateway04 already started) Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice Journal "TplStore": Created Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice Store module initialized; store-dir=/var/lib/qpidd Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice Initializing CPG Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice Cluster store state: dirty cluster-id=63f8f108-5a60-4efe-8506-27385006c5fc Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice cluster(192.168.9.90:28152 PRE_INIT) configuration change: 192.168.9.90:28152 192.168.9.92:6749 Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice cluster(192.168.9.90:28152 PRE_INIT) Members joined: 192.168.9.90:28152 Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice Cluster recovery: recovered journal data discarded and journal files pushed down Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice Store directory /var/lib/qpidd/rhm was pushed down (saved) into directory /var/lib/qpidd/_cluster.bak.0004. Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice Journal "TplStore": Created Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice Listening on TCP/TCP6 port 5672 Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice Listening for SSL connections on TCP port 5671 Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice cluster(192.168.9.90:28152 INIT) cluster-uuid = 63f8f108-5a60-4efe-8506-27385006c5fc Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice cluster(192.168.9.90:28152 JOINER) joining cluster "mm-queue-cluster" Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice Broker running Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice cluster(192.168.9.90:28152 UPDATEE) receiving update from 192.168.9.92:6749 Feb 16 12:58:21 gateway03 qpidd[28152]: 2012-02-16 12:58:21 notice Journal "...": Created (queues being added to persistent store...) Feb 16 12:58:49 gateway03 qpidd[28152]: 2012-02-16 12:58:49 notice Journal "...": Created Feb 16 12:58:50 gateway03 qpidd[28152]: 2012-02-16 12:58:50 notice cluster(192.168.9.90:28152 CATCHUP) update complete, starting catch-up. Feb 16 12:58:50 gateway03 qpidd[28152]: 2012-02-16 12:58:50 notice cluster(192.168.9.90:28152 READY) caught up.
(brokers on gateway03 and gateway04 already started) 2012-02-16 12:58:59 debug Created connection ssl:gateway04:5671 with {password:guest, username:guest} 2012-02-16 12:58:59 info Trying to connect to ssl:gateway04:5671... 2012-02-16 12:58:59 debug Created IO thread: 0 2012-02-16 12:58:59 debug SslConnector created for \x00- 2012-02-16 12:58:59 debug RECV [[59347 192.168.9.92:5671]]: INIT(0-10) 2012-02-16 12:58:59 debug CyrusSasl::start(PLAIN CRAM-MD5 DIGEST-MD5) 2012-02-16 12:58:59 debug external SSF detected and set to 128 2012-02-16 12:58:59 debug external auth detected and set to dummy 2012-02-16 12:58:59 debug min_ssf: 0, max_ssf: 256 2012-02-16 12:58:59 debug CyrusSasl::start(PLAIN CRAM-MD5 DIGEST-MD5): selected DIGEST-MD5 response: '' 2012-02-16 12:58:59 debug getUserFromSettings(): guest 2012-02-16 12:58:59 debug CyrusSasl::step(nonce="zFwuPhC+6PbfNOHDqfuf4U6E2mE8E3ApAWqfMdNxTJ8=",realm="QPID",qop="auth",maxbuf=65535,charset=utf-8,algorithm=md5-sess): username="guest",realm="QPID",nonce="zFwuPhC+6PbfNOHDqfuf4U6E2mE8E3ApAWqfMdNxTJ8=",cnonce="a8bswbmo7ThWIV6o++rfJbMbC3PiF+EhxopFREICAHw=",nc=00000001,qop=auth,maxbuf=65535,digest-uri="qpidd/gateway04",response=55178f8db5c1a5df13aa98ac2582d873 2012-02-16 12:58:59 debug CyrusSasl::step(rspauth=645945230c3b9d71d00ef3221ce9514c): 2012-02-16 12:58:59 info Connection [59347 192.168.9.92:5671] connected to ssl:gateway04:5671 2012-02-16 12:58:59 debug Connection [59347 192.168.9.92:5671] no security layer in place 2012-02-16 12:58:59 info Connected to ssl:gateway04:5671 2012-02-16 12:58:59 debug Added known-hosts, reconnect-urls=[ssl:gateway04:5671, amqp:ssl:gateway03:5671, amqp:ssl:gateway04:5671] sleeping... 2012-02-16 12:58:59 debug Known-brokers for connection: amqp:ssl:gateway03:5671, amqp:ssl:gateway04:5671
Scenario 2 Logs
This is the one where the cluster SASL mechanism breaks.
Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice Journal "TplStore": Created Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice Store module initialized; store-dir=/var/lib/qpidd Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice Initializing CPG Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice Cluster store state: clean cluster-id=63f8f108-5a60-4efe-8506-27385006c5fc shutdown-id=c058c1be-31ea-44a9-bc71-e76c37732d8f Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice cluster(192.168.9.92:7100 PRE_INIT) configuration change: 192.168.9.92:7100 Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice cluster(192.168.9.92:7100 PRE_INIT) Members joined: 192.168.9.92:7100 Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice Journal "...": Created Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice Journal "...": Created Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice Listening on TCP/TCP6 port 5672 Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice Listening for SSL connections on TCP port 5671 Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice cluster(192.168.9.92:7100 INIT) cluster-uuid = 63f8f108-5a60-4efe-8506-27385006c5fc Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice cluster(192.168.9.92:7100 READY) joined cluster "mm-queue-cluster" Feb 16 13:09:32 gateway04 qpidd[7100]: 2012-02-16 13:09:32 notice Broker running (client connected here...) (broker on gateway03 started here...) Feb 16 13:09:51 gateway04 qpidd[7100]: 2012-02-16 13:09:51 notice cluster(192.168.9.92:7100 READY) configuration change: 192.168.9.90:28717 192.168.9.92:7100 Feb 16 13:09:51 gateway04 qpidd[7100]: 2012-02-16 13:09:51 notice cluster(192.168.9.92:7100 READY) Members joined: 192.168.9.90:28717 Feb 16 13:09:51 gateway04 qpidd[7100]: 2012-02-16 13:09:51 notice cluster(192.168.9.92:7100 UPDATER) sending update to 192.168.9.90:28717 at amqp:ssl:gateway03:5671 Feb 16 13:10:19 gateway04 qpidd[7100]: 2012-02-16 13:10:19 warning Closing connection due to Desired mechanism(s) not valid: CRAM-MD5 (supported: ANONYMOUS PLAIN) (qpid/client/ConnectionHandler.cpp:252) Feb 16 13:10:19 gateway04 qpidd[7100]: 2012-02-16 13:10:19 error cluster(192.168.9.92:7100 UPDATER) error sending update: Desired mechanism(s) not valid: CRAM-MD5 (supported: ANONYMOUS PLAIN) (qpid/client/ConnectionHandler.cpp:252) Feb 16 13:10:19 gateway04 qpidd[7100]: 2012-02-16 13:10:19 notice cluster(192.168.9.92:7100 UPDATER) update sent Feb 16 13:10:19 gateway04 qpidd[7100]: 2012-02-16 13:10:19 warning Session was not closed cleanly: guest.x-qpid.cluster-update Feb 16 13:10:19 gateway04 qpidd[7100]: 2012-02-16 13:10:19 warning Connection [37089 192.168.9.90:5671] closed Feb 16 13:10:19 gateway04 qpidd[7100]: 2012-02-16 13:10:19 notice cluster(192.168.9.92:7100 READY) configuration change: 192.168.9.92:7100 Feb 16 13:10:19 gateway04 qpidd[7100]: 2012-02-16 13:10:19 notice cluster(192.168.9.92:7100 READY) Members left: 192.168.9.90:28717
(broker on gateway04 already started, but not gateway03) 2012-02-16 13:09:42 debug Created connection ssl:gateway04:5671 with {password:guest, username:guest} 2012-02-16 13:09:42 info Trying to connect to ssl:gateway04:5671... 2012-02-16 13:09:42 debug Created IO thread: 0 2012-02-16 13:09:42 debug SslConnector created for \x00- 2012-02-16 13:09:42 debug RECV [[52892 192.168.9.92:5671]]: INIT(0-10) 2012-02-16 13:09:42 debug CyrusSasl::start(PLAIN CRAM-MD5 DIGEST-MD5) 2012-02-16 13:09:42 debug external SSF detected and set to 128 2012-02-16 13:09:42 debug external auth detected and set to dummy 2012-02-16 13:09:42 debug min_ssf: 0, max_ssf: 256 2012-02-16 13:09:42 debug CyrusSasl::start(PLAIN CRAM-MD5 DIGEST-MD5): selected DIGEST-MD5 response: '' 2012-02-16 13:09:42 debug getUserFromSettings(): guest 2012-02-16 13:09:42 debug CyrusSasl::step(nonce="DrF43YwEJoxB4MLHa7s12XjlVJVl6uKMfrqlcfsWlkE=",realm="QPID",qop="auth",maxbuf=65535,charset=utf-8,algorithm=md5-sess): username="guest",realm="QPID",nonce="DrF43YwEJoxB4MLHa7s12XjlVJVl6uKMfrqlcfsWlkE=",cnonce="ElxXCVBQPW7vRNzqMU0fNOcLQWaBZoTt8E3HJsTliLY=",nc=00000001,qop=auth,maxbuf=65535,digest-uri="qpidd/gateway04",response=a03283bbb65221600b7917462e49945b 2012-02-16 13:09:42 debug CyrusSasl::step(rspauth=5980599c97fee618ad059b00a381dd5b): 2012-02-16 13:09:42 debug Known-brokers for connection: amqp:ssl:gateway04:5671 2012-02-16 13:09:42 info Connection [52892 192.168.9.92:5671] connected to ssl:gateway04:5671 2012-02-16 13:09:42 debug Connection [52892 192.168.9.92:5671] no security layer in place 2012-02-16 13:09:42 info Connected to ssl:gateway04:5671 2012-02-16 13:09:42 debug Added known-hosts, reconnect-urls=[ssl:gateway04:5671, amqp:ssl:gateway04:5671] sleeping...
(broker on gateway04 already started) (client already connected to gateway04) Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice Journal "TplStore": Created Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice Store module initialized; store-dir=/var/lib/qpidd Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice Initializing CPG Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice Cluster store state: dirty cluster-id=63f8f108-5a60-4efe-8506-27385006c5fc Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice cluster(192.168.9.90:28717 PRE_INIT) configuration change: 192.168.9.90:28717 192.168.9.92:7100 Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice cluster(192.168.9.90:28717 PRE_INIT) Members joined: 192.168.9.90:28717 Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice Cluster recovery: recovered journal data discarded and journal files pushed down Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice Store directory /var/lib/qpidd/rhm was pushed down (saved) into directory /var/lib/qpidd/_cluster.bak.0005. Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice Journal "TplStore": Created Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice Listening on TCP/TCP6 port 5672 Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice Listening for SSL connections on TCP port 5671 Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice cluster(192.168.9.90:28717 INIT) cluster-uuid = 63f8f108-5a60-4efe-8506-27385006c5fc Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice cluster(192.168.9.90:28717 JOINER) joining cluster "mm-queue-cluster" Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice Broker running Feb 16 13:09:50 gateway03 qpidd[28717]: 2012-02-16 13:09:50 notice cluster(192.168.9.90:28717 UPDATEE) receiving update from 192.168.9.92:7100 Feb 16 13:09:51 gateway03 qpidd[28717]: 2012-02-16 13:09:51 notice Journal "...": Created (queues being added to persistent store...) Feb 16 13:10:17 gateway03 qpidd[28717]: 2012-02-16 13:10:17 notice Journal "...": Created Feb 16 13:10:18 gateway03 qpidd[28717]: 2012-02-16 13:10:18 warning Client closed connection with 501: Desired mechanism(s) not valid: CRAM-MD5 (supported: ANONYMOUS PLAIN) (qpid/client/ConnectionHandler.cpp:252) Feb 16 13:10:18 gateway03 qpidd[28717]: 2012-02-16 13:10:18 critical cluster(192.168.9.90:28717 UPDATEE) catch-up connection closed prematurely 192.168.9.92:5671-192.168.9.92:52892(192.168.9.90:28717-2 local,catchup) Feb 16 13:10:18 gateway03 qpidd[28717]: 2012-02-16 13:10:18 notice cluster(192.168.9.90:28717 LEFT) leaving cluster "mm-queue-cluster" Feb 16 13:10:18 gateway03 qpidd[28717]: 2012-02-16 13:10:18 notice Shut down
I'm only too happy try anything that might help trace the problem!