ActiveMQ
  1. ActiveMQ
  2. AMQ-2102

Master/slave out of sync with multiple consumers

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.2.0
    • Fix Version/s: 5.3.0
    • Component/s: Broker
    • Labels:
      None

      Description

      I'm seeing exceptions like this in a simple master/slave setup:

      ERROR Service - Async error occurred: javax.jms.JMSException: Slave broker out of sync with master: Dispatched message (ID:DUL1SJAMES-L2-1231-1233929569359-0:4:1:1:207) was not in the pending list for MasterSlaveBug
      javax.jms.JMSException: Slave broker out of sync with master: Dispatched message (ID:DUL1SJAMES-L2-1231-1233929569359-0:4:1:1:207) was not in the pending list for MasterSlaveBug

      The problem only happens when there are multiple consumers listening to the queue, and is more likely to occur as there are more consumers listening. I've written a test program that demonstrates the problem.

      I start the master and slave with an empty data directory and let them both startup and settle. Then start the test program. The test program creates a specified number of consumers, and then starts queuing 256 messages. The consumers process the message by sending a reply. The producer counts the replies. Both consumers and the producer see all the messages, but with multiple consumers it is very likely that the error above will occur and several of the messages will still be queued on the slave.

      While debugging through the activemq code, I noticed that both the master and the slave dispatch the message to a consumer's pending list independently. In other words, it is possible that the master will add the message to consumer A's pending list and the slave will add the message to consumer B's pending list. Once the message has been processed by consumer A, the master sends a message to the slaving which specifies consumer A so that the slave can remove the message. The slave looks on its copy of consumer A's pending list and cannot find the message. As a result, it throws this exception and the message stays stuck on consumer B's pending list on the slave.

      Master and slave configurations along with MasterSlaveBug.java are attached to this issue.

      Start master and slave brokers:
      activemq xbean:master.xml
      activemq xbean:slave.xml

      Run with (only one consumer, the bug does not appear):
      java -classpath .:activemq-all-5.2.0.jar MasterSlaveBug 1
      Run with (sixteen consumers, the bug does appear):
      java -classpath .:activemq-all-5.2.0.jar MasterSlaveBug 16

      1. slaveDispatchOnNotification.patch
        16 kB
        Gary Tully
      2. slave.xml
        2 kB
        Dan James
      3. MasterSlavePatch.patch
        1 kB
        ying
      4. MasterSlaveBug.java
        13 kB
        Dan James
      5. master.xml
        2 kB
        Dan James
      6. AMQ-2102-03102009.patch
        23 kB
        ying
      7. AMQ2102.12-03.patch
        28 kB
        Gary Tully

        Activity

        Dan James created issue -
        Dan James made changes -
        Field Original Value New Value
        Attachment MasterSlaveBug.java [ 17617 ]
        Dan James made changes -
        Attachment master.xml [ 17618 ]
        Dan James made changes -
        Attachment slave.xml [ 17619 ]
        Dan James made changes -
        Description I'm seeing exceptions like this in a simple master/slave setup:

        ERROR Service - Async error occurred: javax.jms.JMSException: Slave broker out of sync with master: Dispatched message (ID:DUL1SJAMES-L2-1231-1233929569359-0:4:1:1:207) was not in the pending list for MasterSlaveBug
        javax.jms.JMSException: Slave broker out of sync with master: Dispatched message (ID:DUL1SJAMES-L2-1231-1233929569359-0:4:1:1:207) was not in the pending list for MasterSlaveBug

        The problem only happens when there are multiple consumers listening to the queue, and is more likely to occur as there are more consumers listening. I've written a test program that demonstrates the problem.

        I start the master and slave with an empty data directory and let them both startup and settle. Then start the test program. The test program creates a specified number of consumers, and then starts queuing 256 messages. The consumers process the message by sending a reply. The producer counts the replies. Both consumers and the producer see all the messages, but with multiple consumers it is very likely that the error above will occur and several of the messages will still be queued on the slave.

        While debugging through the activemq code, I noticed that both the master and the slave dispatch the message to a consumer's pending list independently. In other words, it is possible that the master will add the message to consumer A's pending list and the slave will add the message to consumer B's pending list. Once the message has been processed by consumer A, the master sends a message to the slaving which specifies consumer A so that the slave can remove the message. The slave looks on its copy of consumer A's pending list and cannot find the message. As a result, it throws this exception and the message stays stuck on consumer B's pending list on the slave.
        I'm seeing exceptions like this in a simple master/slave setup:

        ERROR Service - Async error occurred: javax.jms.JMSException: Slave broker out of sync with master: Dispatched message (ID:DUL1SJAMES-L2-1231-1233929569359-0:4:1:1:207) was not in the pending list for MasterSlaveBug
        javax.jms.JMSException: Slave broker out of sync with master: Dispatched message (ID:DUL1SJAMES-L2-1231-1233929569359-0:4:1:1:207) was not in the pending list for MasterSlaveBug

        The problem only happens when there are multiple consumers listening to the queue, and is more likely to occur as there are more consumers listening. I've written a test program that demonstrates the problem.

        I start the master and slave with an empty data directory and let them both startup and settle. Then start the test program. The test program creates a specified number of consumers, and then starts queuing 256 messages. The consumers process the message by sending a reply. The producer counts the replies. Both consumers and the producer see all the messages, but with multiple consumers it is very likely that the error above will occur and several of the messages will still be queued on the slave.

        While debugging through the activemq code, I noticed that both the master and the slave dispatch the message to a consumer's pending list independently. In other words, it is possible that the master will add the message to consumer A's pending list and the slave will add the message to consumer B's pending list. Once the message has been processed by consumer A, the master sends a message to the slaving which specifies consumer A so that the slave can remove the message. The slave looks on its copy of consumer A's pending list and cannot find the message. As a result, it throws this exception and the message stays stuck on consumer B's pending list on the slave.

        Master and slave configurations along with MasterSlaveBug.java are attached to this issue.

        Start master and slave brokers:
        activemq xbean:master.xml
        activemq xbean:slave.xml

        Run with (only one consumer, the bug does not appear):
           java -classpath .:activemq-all-5.2.0.jar MasterSlaveBug 1
        Run with (sixteen consumers, the bug does appear):
           java -classpath .:activemq-all-5.2.0.jar MasterSlaveBug 16
        Gary Tully made changes -
        Assignee Gary Tully [ gtully ]
        ying made changes -
        Attachment MasterSlavePatch.patch [ 17720 ]
        Gary Tully made changes -
        Attachment slaveDispatchOnNotification.patch [ 17721 ]
        ying made changes -
        Attachment AMQ-2102-03102009.patch [ 17756 ]
        Gary Tully made changes -
        Attachment AMQ2102.12-03.patch [ 17767 ]
        Gary Tully made changes -
        Fix Version/s 5.3.0 [ 11914 ]
        Resolution Fixed [ 1 ]
        Status Open [ 1 ] Resolved [ 5 ]
        Jeff Turner made changes -
        Project Import Fri Nov 26 22:32:02 EST 2010 [ 1290828722158 ]

          People

          • Assignee:
            Gary Tully
            Reporter:
            Dan James
          • Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development