Description
This issue was hit in test MultiThreadRandomReattachTest. There are several client's threads doing some work, while connection fail is simulated. The test expects that all threads finish without exceptions.
This issue causes that some client's threads sometime fail with an exception AMQ119014: Timed out after waiting 30,000 ms for response when sending packet XXX.
I found out that the mentioned exception is caused by temporary deadlock during doing failover on client's side. These two threads block each other.
"Thread-7" Id=29 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1d03220 at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1d03220 at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) at org.apache.activemq.artemis.utils.ConcurrentUtil.await(ConcurrentUtil.java:37) at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.waitForFailOver(ChannelImpl.java:256) at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:283) - locked java.lang.Object@938196 at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:229) at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sendProducerCreditsMessage(ActiveMQSessionContext.java:421) at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.sendProducerCreditsMessage(ClientSessionImpl.java:1342) at org.apache.activemq.artemis.core.client.impl.ClientProducerCreditsImpl.requestCredits(ClientProducerCreditsImpl.java:209) at org.apache.activemq.artemis.core.client.impl.ClientProducerCreditsImpl.checkCredits(ClientProducerCreditsImpl.java:204) at org.apache.activemq.artemis.core.client.impl.ClientProducerCreditsImpl.init(ClientProducerCreditsImpl.java:71) at org.apache.activemq.artemis.core.client.impl.ClientProducerCreditManagerImpl.getCredits(ClientProducerCreditManagerImpl.java:79) - locked org.apache.activemq.artemis.core.client.impl.ClientProducerCreditManagerImpl@f7a5dc at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.getCredits(ClientSessionImpl.java:1347) - locked org.apache.activemq.artemis.core.client.impl.ClientSessionImpl@10867c8 at org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.<init>(ClientProducerImpl.java:102) at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.internalCreateProducer(ClientSessionImpl.java:1817) at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.createProducer(ClientSessionImpl.java:740) at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.createProducer(ClientSessionImpl.java:730) at org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadRandomReattachTestBase.doTestB(MultiThreadRandomReattachTestBase.java:398) at org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadRandomReattachTestBase$2.run(MultiThreadRandomReattachTestBase.java:84) at org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadReattachSupportTestBase$1Runner.run(MultiThreadReattachSupportTestBase.java:104)
"Timer-0" Id=9 BLOCKED on org.apache.activemq.artemis.core.client.impl.ClientSessionImpl@10867c8 owned by "Thread-7" Id=29 at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.handleFailover(ClientSessionImpl.java:1206) - blocked on org.apache.activemq.artemis.core.client.impl.ClientSessionImpl@10867c8 at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.reconnectSessions(ClientSessionFactoryImpl.java:771) at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:614) at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:504) at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$600(ClientSessionFactoryImpl.java:72) at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1175) at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.callFailureListeners(AbstractRemotingConnection.java:70) at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:209) at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.fail(AbstractRemotingConnection.java:213) at org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadReattachSupportTestBase$Failer.run(MultiThreadReattachSupportTestBase.java:220) - locked org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadReattachSupportTestBase$Failer@16d859 at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Number of locked synchronizers = 1 - java.util.concurrent.locks.ReentrantLock$NonfairSync@1ac8dee
The first thread holds ClientSessionImpl lock in method getCredits which tries to send a packet and thus it waits until the connection is reconnected or do failover to backup.
public final class ClientSessionImpl implements ClientSessionInternal, FailureListener { @Override public synchronized ClientProducerCredits getCredits(final SimpleString address, final boolean anon) { ClientProducerCredits credits = producerCreditManager.getCredits(address, anon, sessionContext); return credits; } }
The second thread is responsible for handling the connection failure and doing the re-connection or failover. However it is blocked by the first thread because it requires ClientSessionImpl lock.
public final class ClientSessionImpl implements ClientSessionInternal, FailureListener { @Override public void handleFailover(final RemotingConnection backupConnection, ActiveMQException cause) { synchronized (this) { if (closed) { return; } ... } }
This situation lasts until some other thread throws an exception because it doesn't receive response for its blocking packet, as the the connection was not reconnected.
Attachments
Issue Links
- links to