ActiveMQ
  1. ActiveMQ
  2. AMQ-2161

Unmatched acknowledege Exception and duplicate received messages in XA transaction

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 5.2.0
    • Fix Version/s: 5.x
    • Component/s: None
    • Labels:
      None
    • Environment:

      Ubuntu Linux 2.6.24-22
      Processor with 2 cores
      Java 1.6.0_02-b05
      Bitronix Transaction Manager 1.3.2

      Description

      I tried to read about 200000 messages from a queue. Reading was performed in XA transactions in chunks of 1000 messages per transaction.
      After reading some messages (I almost never came above 100000 read messages), an exception occured in the activemq log file (data/activemq.log).
      In my application, null was returned for the message. Reading of further messages always returned null until a new transaction was opened.
      No exception was thrown and the transaction was not marked as rolled back.
      After that, the number of read messages in my application and the number of dequeued messages I saw in the activemq JMX interface was often no longer in sync. It seems to me that activemq was of the opinion that the transaction was rolled back while my application was not.
      Another indication is that my application always gets one of the read messages a second time some time after the incident.

      When I set the prefetchPolicy.all to 1, the problem first seemed to disappear. Normally, there were no more exceptions in the log file and a null message was never returned. After testing a little more, the problem also occured with prefetch size 1 (see attached log file). However, it seems to occur less often and only with more messages in the queue (I tested with 280000). Duplicate messages did not occur when I tested it once, but the JMX interface to activemq reported 280200 messages dequeued even though there were only 280000 in the queue. It also reported a size of -200 when the queue was empty.

      I also tried with prefetchPolicy.all=0, but that created a different problem: there are no exceptions either, but after some time, the application hang completely in the reader thread and never returned. I tried this only once, but probably this should also be investigated, if there is no easy explanation.

      I wrote a test case that demonstrates the bug (see attachment). It always occured for 200000 messages, and almost always for 100000 messages. For 20000 messages, for example, the problem almost never occurs. For executing the test, I first stopped the activemq server, deleted the data directory and then restarted the server. After that, I executed the test, which first writes all messages into a queue and then tries to read them back.

      The stacktrace from the activemq log file is attached.

      The configuration of the broker in my activemq.xml configuration file:

          <broker xmlns="http://activemq.apache.org/schema/core" brokerName="localhost" dataDirectory="${activemq.base}/data" persistent="true">
      
              <destinationPolicy>
                  <policyMap>
                      <policyEntries>
                          <policyEntry queue=">" memoryLimit="2000mb"/>
                          <policyEntry topic=">" memoryLimit="5mb">
                          </policyEntry>
                      </policyEntries>
                  </policyMap>
              </destinationPolicy>
              <managementContext>
                  <managementContext createConnector="false"/>
              </managementContext>
      
              <networkConnectors>
              </networkConnectors>
      
              <persistenceAdapter>
                  <amqPersistenceAdapter syncOnWrite="false" directory="${activemq.base}/data" maxFileLength="20 mb"
                                         indexBinSize="131072" indexPageSize="128kb" />
              </persistenceAdapter>
      
              <sslContext>
                  <sslContext keyStore="file:${activemq.base}/conf/broker.ks" keyStorePassword="password" 
      trustStore="file:${activemq.base}/conf/broker.ts" trustStorePassword="password"/>
              </sslContext>
      
              <systemUsage>
                  <systemUsage>
                      <memoryUsage>
                          <memoryUsage limit="1000 mb"/>
                      </memoryUsage>
                      <storeUsage>
                          <storeUsage limit="2 gb" name="foo"/>
                      </storeUsage>
                      <tempUsage>
                          <tempUsage limit="2000 mb"/>
                      </tempUsage>
                  </systemUsage>
              </systemUsage>
      
              <transportConnectors>
                  <transportConnector name="openwire" uri="tcp://localhost:61616" />
              </transportConnectors>
      
          </broker>
      

      The configuration of the transaction manager (from the Spring application context):

      <bean id="btmConfig"
              class="bitronix.tm.TransactionManagerServices" factory-method="getConfiguration">
              <property name="serverId" value="${processes.serverId}"/>
              <property name="logPart1Filename" value="${processes.bitronix.logPart1Filename}" />
              <property name="logPart2Filename" value="${processes.bitronix.logPart2Filename}" />
              <property name="warnAboutZeroResourceTransaction" value="false" />
          </bean>
          
          <bean id="bitronixTransactionManager" destroy-method="shutdown" depends-on="btmConfig"
              class="bitronix.tm.TransactionManagerServices" factory-method="getTransactionManager">
              <property name="transactionTimeout" value="600000"/>
          </bean>
          
          <bean id="transactionManager" class="org.springframework.transaction.jta.JtaTransactionManager">
              <property name="transactionManager" ref="bitronixTransactionManager" />
              <property name="userTransaction" ref="bitronixTransactionManager" />
              <property name="autodetectTransactionManager" value="false" />
              <property name="autodetectUserTransaction" value="false" />
              <property name="defaultTimeout" value="600000" />
          </bean>
          
          <bean id="transactionTemplate"
              class="org.springframework.transaction.support.TransactionTemplate">
              <property name="transactionManager" ref="transactionManager" />
          </bean>
      

      The queue definition from the applicationContext:

      <amq:queue id="loadtestqueue" physicalName="loadtestqueue" />
      
      1. activemq-error-async.log
        23 kB
        Michael Gottschalk
      2. ActiveMqLoadTest.java
        6 kB
        Michael Gottschalk
      3. activemq-error-prefetch1.log
        5 kB
        Michael Gottschalk

        Activity

        Hide
        Michael Gottschalk added a comment -

        Extract from the ActiveMQ log file with the exceptions.

        Show
        Michael Gottschalk added a comment - Extract from the ActiveMQ log file with the exceptions.
        Hide
        Michael Gottschalk added a comment -

        Attached the TestNG test case.

        Show
        Michael Gottschalk added a comment - Attached the TestNG test case.
        Hide
        Michael Gottschalk added a comment -

        Added a log file which shows that the error also occurs with prefetch size 1.

        Show
        Michael Gottschalk added a comment - Added a log file which shows that the error also occurs with prefetch size 1.
        Hide
        Gary Tully added a comment -

        Just had a quick look at this test to see if it could easily be validated for 5.4, I notice the use of a pooled connection factory. For producers this makes sense, but for consumers it can lead to dangling prefetched messages if the consumers are added back to the pool. Just wanted to add this in case it is related to the problem.

        Show
        Gary Tully added a comment - Just had a quick look at this test to see if it could easily be validated for 5.4, I notice the use of a pooled connection factory. For producers this makes sense, but for consumers it can lead to dangling prefetched messages if the consumers are added back to the pool. Just wanted to add this in case it is related to the problem.

          People

          • Assignee:
            Unassigned
            Reporter:
            Michael Gottschalk
          • Votes:
            8 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:

              Development