Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
5.11.0, 5.11.1
-
None
-
Platform independent
-
Patch Available
Description
When we run JBoss EAP 6.4 with high load using AMQ 6.2 (ActiveMQ 5.11) everything grinds to a halt after a few hours. We have 170 threads blocked on the same lock:
"default-threads - 1400" #406720 prio=5 os_prio=0 tid=0x00007f1b8402b800 nid=0xfe2d waiting for monitor entry [0x00007f19ccdd5000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.activemq.TransactionContext.setXid(TransactionContext.java:729)
- waiting to lock <0x000000063ac688b8> (a java.util.HashMap)
at org.apache.activemq.TransactionContext.end(TransactionContext.java:418)
at org.apache.activemq.ra.LocalAndXATransaction.end(LocalAndXATransaction.java:98)
at org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.end(XAResourceWrapperImpl.java:118)
at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelPrepare(XAResourceRecord.java:208)
at com.arjuna.ats.arjuna.coordinator.BasicAction.doPrepare(BasicAction.java:2530)
at com.arjuna.ats.arjuna.coordinator.BasicAction.doPrepare(BasicAction.java:2497)
at com.arjuna.ats.arjuna.coordinator.BasicAction.prepare(BasicAction.java:2074) - locked <0x000000067461c090> (a com.arjuna.ats.internal.jta.transaction.arjunacore.AtomicAction)
at com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1485) - locked <0x000000067461c090> (a com.arjuna.ats.internal.jta.transaction.arjunacore.AtomicAction)
at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:98)
at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162)
This lock is owned by a thread that is waiting for another lock:
"default-threads - 1381" #404073 prio=5 os_prio=0 tid=0x00007f1a6403c000 nid=0x7d49 waiting on condition [0x00007f19d8a91000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000063ab2a260> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:66)
at org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
at org.apache.activemq.ActiveMQConnection.doAsyncSendPacket(ActiveMQConnection.java:1309)
at org.apache.activemq.ActiveMQConnection.asyncSendPacket(ActiveMQConnection.java:1303)
at org.apache.activemq.ActiveMQSession.asyncSendPacket(ActiveMQSession.java:2008)
at org.apache.activemq.ActiveMQSession$5.afterRollback(ActiveMQSession.java:973)
at org.apache.activemq.TransactionContext.afterRollback(TransactionContext.java:162)
at org.apache.activemq.TransactionContext.prepare(TransactionContext.java:482) - locked <0x000000063ac688b8> (a java.util.HashMap)
at org.apache.activemq.ra.LocalAndXATransaction.prepare(LocalAndXATransaction.java:130)
at com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelPrepare(XAResourceRecord.java:213)
at com.arjuna.ats.arjuna.coordinator.BasicAction.doPrepare(BasicAction.java:2530)
at com.arjuna.ats.arjuna.coordinator.BasicAction.doPrepare(BasicAction.java:2497)
The locked object is ENDED_XA_TRANSACTION_CONTEXTS in TransactionContext. The thread that has the lock calls afterRollback without first releasing it and afterRollback does not return promptly (not at all in fact, we need to restart the server to fix this). The same can happen with afterCommit, which is also called from within a synchronized block causing a potential deadlock.
I think I have a fix for this and will create a pull request in github, or if you prefer I can send a patch? It simply means moving the external calls out from the synchronized blocks and would both remove the very real deadlock risks and improve performance.