Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-3495

Bidirectional messaging using the same queue causes cluster node restart to fail.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.12
    • None
    • C++ Broker, C++ Clustering
      • RHEL release 5.5 (Tikanga) x64
      • OpenAIS 0.80.6-16.el5
      • Fails on both: qpid-cpp-0.12(apache) and qpid-cpp-0.10(mrg)

    Description

      Description of the problem

      1) start a cluster on two nodes N1 and N2
      2) start process p1 with consumer C1 and producer P1 for queue Q. C1 uses JMS Target T1
      3) start process p2 with consumer C2 and producer P2 for queue Q and sending one text message / sec to queue Q with target T1. C2 uses JMS Target T2.
      4) confirm with tcpdump that N1 is retrieving the traffic
      5) shut down node N1 with 'qpidd --quit'
      6) wait for 5 sec
      7) restart node N1 with 'qpidd'
      8) check the qpidd.log with the error catch-up connection closed prematurely

      Test code is attached with this bug report, the case is actually simpler than it seems from the previous description and reading the test code should clarify the problem.

      QPid log when trying to restart

      2011-09-19 14:00:43 notice Initializing CPG
      2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 PRE_INIT) configuration change: 172.16.133.120:29504 172.16.133.123:19037
      2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 PRE_INIT) Members joined: 172.16.133.123:19037
      2011-09-19 14:00:43 notice SASL disabled: No Authentication Performed
      2011-09-19 14:00:43 notice Listening on TCP port 5672
      2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 INIT) cluster-uuid = 7ab02e1b-67dd-4fed-b176-79b567ab699f
      2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 JOINER) joining cluster MYCLUSTER
      2011-09-19 14:00:43 notice Broker running
      2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 UPDATEE) receiving update from 172.16.133.120:29504
      2011-09-19 14:00:43 error deliveryRecord no update message (qpid/cluster/Connection.cpp:537)
      2011-09-19 14:00:43 critical cluster(172.16.133.123:19037 UPDATEE) catch-up connection closed prematurely 172.16.133.120:5672-172.16.136.143:53170(172.16.133.123:19037-4 local,catchup)
      2011-09-19 14:00:43 notice cluster(172.16.133.123:19037 LEFT) leaving cluster MYCLUSTER
      2011-09-19 14:00:43 notice Shut down 
      

      Configuration files

      qpidd.conf

      cluster-mechanism=ANONYMOUS
      cluster-name=MYCLUSTER
      log-to-file=/home/qpid/qpid.log
      daemon=yes
      no-data-dir=yes
      auth=no
      

      openais.conf

      totem {
      	version: 2
      	secauth: off
      	threads: 0
      	interface {
      		ringnumber: 0
      		bindnetaddr: 172.16.133.0
      		mcastaddr: 226.94.1.1
      		mcastport: 5405
      	}
      }
      
      logging {
      	debug: off
      	timestamp: on
      }
      
      amf {
      	mode: disabled
      }
      

      Steps for running the test repeating the problem

      1) Download, extract and compile attachment qpid-cluster-problem.tar.gz
      2) Start two QPid nodes on a cluster
      3) Start ConsumerTest.class and ProducerTest.class on a third node
      4) Restart nodes as described previously
      5) Change attribute ProducerTest.TEST_SCENARIO=2 on the source code and test again. The cluster node restarts should work because producer and consumer are now attached to separate queues on both ConsumerTest and ProducerTest.

      Attachments

        1. qpid-cluster-problem.tar.gz
          16 kB
          Jaakko Nyman

        Activity

          People

            aconway Alan Conway
            fijany Jaakko Nyman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: