Uploaded image for project: 'ActiveMQ Artemis'
  1. ActiveMQ Artemis
  2. ARTEMIS-1506

Synchronization issue during failover in ClientSessionImpl

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.5.0
    • Broker
    • None

    Description

      This issue was hit in test MultiThreadRandomReattachTest. There are several client's threads doing some work, while connection fail is simulated. The test expects that all threads finish without exceptions.

      This issue causes that some client's threads sometime fail with an exception AMQ119014: Timed out after waiting 30,000 ms for response when sending packet XXX.

      I found out that the mentioned exception is caused by temporary deadlock during doing failover on client's side. These two threads block each other.

      "Thread-7" Id=29 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1d03220
      	at sun.misc.Unsafe.park(Native Method)
      	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1d03220
      	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
      	at org.apache.activemq.artemis.utils.ConcurrentUtil.await(ConcurrentUtil.java:37)
      	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.waitForFailOver(ChannelImpl.java:256)
      	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:283)
      	-  locked java.lang.Object@938196
      	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:229)
      	at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sendProducerCreditsMessage(ActiveMQSessionContext.java:421)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.sendProducerCreditsMessage(ClientSessionImpl.java:1342)
      	at org.apache.activemq.artemis.core.client.impl.ClientProducerCreditsImpl.requestCredits(ClientProducerCreditsImpl.java:209)
      	at org.apache.activemq.artemis.core.client.impl.ClientProducerCreditsImpl.checkCredits(ClientProducerCreditsImpl.java:204)
      	at org.apache.activemq.artemis.core.client.impl.ClientProducerCreditsImpl.init(ClientProducerCreditsImpl.java:71)
      	at org.apache.activemq.artemis.core.client.impl.ClientProducerCreditManagerImpl.getCredits(ClientProducerCreditManagerImpl.java:79)
      	-  locked org.apache.activemq.artemis.core.client.impl.ClientProducerCreditManagerImpl@f7a5dc
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.getCredits(ClientSessionImpl.java:1347)
      	-  locked org.apache.activemq.artemis.core.client.impl.ClientSessionImpl@10867c8
      	at org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.<init>(ClientProducerImpl.java:102)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.internalCreateProducer(ClientSessionImpl.java:1817)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.createProducer(ClientSessionImpl.java:740)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.createProducer(ClientSessionImpl.java:730)
      	at org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadRandomReattachTestBase.doTestB(MultiThreadRandomReattachTestBase.java:398)
      	at org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadRandomReattachTestBase$2.run(MultiThreadRandomReattachTestBase.java:84)
      	at org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadReattachSupportTestBase$1Runner.run(MultiThreadReattachSupportTestBase.java:104)
      
      "Timer-0" Id=9 BLOCKED on org.apache.activemq.artemis.core.client.impl.ClientSessionImpl@10867c8 owned by "Thread-7" Id=29
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.handleFailover(ClientSessionImpl.java:1206)
      	-  blocked on org.apache.activemq.artemis.core.client.impl.ClientSessionImpl@10867c8
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.reconnectSessions(ClientSessionFactoryImpl.java:771)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:614)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:504)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$600(ClientSessionFactoryImpl.java:72)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1175)
      	at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.callFailureListeners(AbstractRemotingConnection.java:70)
      	at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:209)
      	at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.fail(AbstractRemotingConnection.java:213)
      	at org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadReattachSupportTestBase$Failer.run(MultiThreadReattachSupportTestBase.java:220)
      	-  locked org.apache.activemq.artemis.tests.integration.cluster.reattach.MultiThreadReattachSupportTestBase$Failer@16d859
      	at java.util.TimerThread.mainLoop(Timer.java:555)
      	at java.util.TimerThread.run(Timer.java:505)
      
      	Number of locked synchronizers = 1
      	- java.util.concurrent.locks.ReentrantLock$NonfairSync@1ac8dee
      

      The first thread holds ClientSessionImpl lock in method getCredits which tries to send a packet and thus it waits until the connection is reconnected or do failover to backup.

      public final class ClientSessionImpl implements ClientSessionInternal, FailureListener {
      
         @Override
         public synchronized ClientProducerCredits getCredits(final SimpleString address, final boolean anon) {
            ClientProducerCredits credits = producerCreditManager.getCredits(address, anon, sessionContext);
      
            return credits;
         }
      }
      

      The second thread is responsible for handling the connection failure and doing the re-connection or failover. However it is blocked by the first thread because it requires ClientSessionImpl lock.

      public final class ClientSessionImpl implements ClientSessionInternal, FailureListener {
      
         @Override
         public void handleFailover(final RemotingConnection backupConnection, ActiveMQException cause) {
            synchronized (this) {
               if (closed) {
                  return;
               }
               ...
         }
      }
      

      This situation lasts until some other thread throws an exception because it doesn't receive response for its blocking packet, as the the connection was not reconnected.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              eduda Erich Duda
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: