Uploaded image for project: 'ActiveMQ Artemis'
  1. ActiveMQ Artemis
  2. ARTEMIS-3992

Store corruption and broker instabillty with rollback of XA transactions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 2.16.0
    • None
    • Broker
    • None

    Description

      Edit : i had bad information about the time of the upgrade to 2.24.0, it was repeated just before upgrade, final status of the issue pending.

      We are experiancing a major stability issue with artemis which seems triggered by expired XA transactions.

      It starts with a bunch of timeouts like

      2022-09-13 00:00:02,970 WARN  [org.apache.activemq.artemis.core.server] AMQ222103: transaction with xid XidImpl (2133539424 (...) timed out

      Then a lot of recurring exceptions on the persistent store

      MQ222055: Error on deleting duplicate cache: java.lang.IllegalStateException: Cannot find add info 228196096 on compactor or current records
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkKnownRecordID(JournalImpl.java:1152) [artemis-journal-2.16.0.jar:2.16.0]
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:989) [artemis-journal-2.16.0.jar:2.16.0]
              at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.deleteDuplicateID(AbstractJournalStorageManager.java:482) [artemis-server-2.16.0.jar:2.16.0]
              at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl.addToCacheInMemory(DuplicateIDCacheImpl.java:265) [artemis-server-2.16.0.jar:2.16.0]
              at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl.access$000(DuplicateIDCacheImpl.java:41) [artemis-server-2.16.0.jar:2.16.0]
              at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl$AddDuplicateIDOperation.process(DuplicateIDCacheImpl.java:347) [artemis-server-2.16.0.jar:2.16.0]
              at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl$AddDuplicateIDOperation.beforeCommit(DuplicateIDCacheImpl.java:363) [artemis-server-2.16.0.jar:2.16.0]
              at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.beforeCommit(TransactionImpl.java:599) [artemis-server-2.16.0
      

      From client side the consuming seems to slow down and at some point stops completely.
      The broker can partialy recover with a restart but its seems be still have issues if not given a new clean and empty persistant store.

      (Note : it might be similar to ARTEMIS-2373)

      Background :

      • It's a standalone artemis instance serving as front for other brokers (connected by bridges, working fine). It forwards messages submitted by clients to brokers connected to applications services and get back response messages which are consumed by the clients (basically a kind of reverse proxy).
      • It has been recently upgraded to 2.24.0 hoping that would fix the issue, but it remains identical.
      • It's a production system, the issue have not yet been reproduced on test environments (but it is repeated several times on this production environment)
      • We do not own the client trying to consume the messages and have little information on the specifics of its internals and XA usage.
      • Clients not using XA did not exhibit this kind of issue using the services for months, even years.

      Attachments

        Activity

          People

            Unassigned Unassigned
            slx SL
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: