Uploaded image for project: 'ActiveMQ Classic'
  1. ActiveMQ Classic
  2. AMQ-4720

Messages lost after fail-back of a network connector using priorityBackup=true - reason is that remote broker isn't checking producerID & is rejecting because of duplicate producerSequence

    XMLWordPrintableJSON

Details

    Description

      Summary of problem:
      -------------------
      If a static failover network connector is setup to connect to 2 other brokers & to fail-back to a priority broker; messages can be lost after fail-back because the remote broker deletes them due to duplicate producer-sequence numbers even though the producer-id has changed.
      My suspicion is that the remote broker doesn't recognise that the re-established connection is a network connection & so doesn't check producer-id.

      Test-harness setup:
      -------------------
      Using ActiveMQ 5.8.0 binary download.
      Only changes are to logging settings & to the configuration file.

      3 brokers ("amq1", "amq2", "amq3"), all brokers running on localhost.
      Each uses their own config file (amq1.xml, amq2.xml, amq3.xml)

      Broker amq1 has a failover duplex connection to amq2.
      Broker amq3 has a duplex failover connection to both amq1 + amq2, it is configured to always try to connect to amq1 first ("randomize=false") and to fail-back to amq1 if it comes back online ("priorityBackup=true")

      Consumer connects to broker amq1
      Producer connects to broker amq3

      Test-harness sender application creates a new session each time it is run & sends a set of messages.
      The sending session is not transacted & is set to auto-acknowledge.
      Messages are sent with persistent delivery mode.
      Messages are on queue "MyQueue"

      Test script:
      ------------
      Start all 3 brokers.
      Broker amq3 establishes a connection to amq1.
      Broker amq1 establishes a connection to amq2.

      Consumer connects to amq2 & starts consuming queue "MyQueue".
      Producer connects to amq3 & sends 10 messages on queue "MyQueue" - these are all passed on to broker amq1 which forwards them to amq2 where they are delivered to the consumer.
      Producer connects to amq3 & sends 10 messages on queue "MyQueue" - these are all delivered as before - N.B. producerID is different as this is a new connection.

      Broker amq1 is shut down.
      Broker amq3 fails-over to connect to amq2.

      Producer connects to amq3 & sends 10 messages on queue "MyQueue" - these are all passed directly to amq2 where they are delivered to the consumer - (as before, the producer-id has changed).
      Producer connects to amq3 & sends 10 messages on queue "MyQueue" - these are delivered as before.

      Broker amq1 is restarted
      Broker amq1 re-establishes its connection to amq2.

      Broker amq3 notices that amq1 is available & fails-back to it.

      • Broker amq3 closes its connection to amq2
      • Broker amq3 starts a new connection to amq1

      Producer connects to amq3 & sends 10 messages on queue "MyQueue" - these are all passed directly to amq2 where they are delivered to the consumer - (as before, the producer-id has changed).

      • N.B. Immediately before the first message is received & forwarded by amq1, amq1's log shows:
        2013-09-11 12:05:56,639 | DEBUG | last stored sequence id set: -1 | org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
        ---> This message only appears after fail-back, it doesn't appear earlier.
        This is indicative of the network connection being treated differently after fail-back.

      **********************

        • Error occurs now **
          **********************
        • Producer connects to amq3 & sends 20 messages on queue "MyQueue" (with a different producer-ID)
      • The first 10 are deleted by broker amq2 because it thinks that they have a duplicate sequence ID.
      • amq1 log shows:
        2013-09-11 12:06:29,201 | DEBUG | suppressing duplicate message send [ID:bd7ewandymay-56895-1378897588954-0:1:1:1:1] with producerSequenceId [1] less than last stored: 10 | org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
        2013-09-11 12:06:29,223 | DEBUG | suppressing duplicate message send [ID:bd7ewandymay-56895-1378897588954-0:1:1:1:2] with producerSequenceId [2] less than last stored: 10 | org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
        ... snip ...
        2013-09-11 12:06:29,396 | DEBUG | suppressing duplicate message send [ID:bd7ewandymay-56895-1378897588954-0:1:1:1:10] with producerSequenceId [10] less than last stored: 10 | org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
      • The last 10 are successfully forwarded to amq2, where they are consumed.
        • Producer connects to amq3 & sends 30 messages on queue "MyQueue" (with a different producer-ID)
      • The first 20 are deleted by broker amq2 because it thinks that they have a duplicate sequence ID.
      • amq1 log shows:
        2013-09-11 12:06:45,668 | DEBUG | suppressing duplicate message send [ID:bd7ewandymay-56899-1378897605440-0:1:1:1:1] with producerSequenceId [1] less than last stored: 20 | org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
        2013-09-11 12:06:45,682 | DEBUG | suppressing duplicate message send [ID:bd7ewandymay-56899-1378897605440-0:1:1:1:2] with producerSequenceId [2] less than last stored: 20 | org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
        ... snip ...
        2013-09-11 12:06:45,959 | DEBUG | suppressing duplicate message send [ID:bd7ewandymay-56899-1378897605440-0:1:1:1:20] with producerSequenceId [20] less than last stored: 20 | org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
      • The last 10 are successfully forwarded to amq2, where they are consumed.

      It looks to me as if amq1 doesn't realise that the fail-back network connection established by amq3 is a network connection & so isn't checking producer IDs.

      Details of why I'm trying this configuration:
      ---------------------------------------------

      Use case:
      ---------
      1 central site.
      Multiple branches, each with a single branch server and multiple user PCs.
      Each branch only has 1 internet connection that is shared by branch server & PCs.
      Branch server is typically unreliable hardware & may go offline without notice.
      Resilience to network loss is important & so each PC & server has its own broker.
      Both branch server & PCs need to be able to communicate with the centre

      To reduce the number of connections into the centre, we would like a tree topology with the branch server concentrating all branch PC messages & forwarding them to the centre.
      But, PCs generate a data feed that we want to be able to access at centre, even when the branch server is offline.

      Proposed configuration:
      -----------------------
      Use a failover network connection on branch PCs & configure the connection to prioritise a connection to the branch server, but open a direct connection to the centre if the branch server is unavailable.

      Attachments

        1. amq3.log
          447 kB
          Andrew May
        2. amq3.xml
          2 kB
          Andrew May
        3. amq2.log
          228 kB
          Andrew May
        4. amq2.xml
          1 kB
          Andrew May
        5. amq1.xml
          1 kB
          Andrew May
        6. amq1.log
          315 kB
          Andrew May

        Issue Links

          Activity

            People

              Unassigned Unassigned
              andymay Andrew May
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: