Uploaded image for project: 'ActiveMQ Artemis'
  1. ActiveMQ Artemis
  2. ARTEMIS-4114

Broker deadlock occurs when restarting another broker in the cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.19.1
    • 2.28.0
    • Broker
    • None

    Description

      Broker deadlock occurs when restarting another broker in the cluster.

      When one of the cluster brokers is restarted (cluster of 4 brokers) we get a restart of another broker.
      Brokers are connected via staticConnectors, scaleDown policy is also configured:

          <ha-policy>
             <live-only>
                <scale-down>
                   <connectors>
                      <connector-ref>ART.EL.CLS1-connector</connector-ref>
                      <connector-ref>ART.EL.CLS2-connector</connector-ref>
                      <connector-ref>ART.EL.CLS3-connector</connector-ref>
                   </connectors>
                </scale-down>
            </live-only>
          </ha-policy>

      Logs of fallen broker: 

      Deadlock detected!
      
      "Thread-16 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@46cc127b)" Id=82 BLOCKED on org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03 owned by "Thread-142 (ActiveMQ-client-global-threads)" Id=10066
          at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.handle(BridgeImpl.java:620)
          -  blocked on org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
          at org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:3897)
          -  locked org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
          at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:3061)
          -  locked org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
          at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:4205)
          at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
          at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
          at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
          at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$134/0x00000008002b5840.run(Unknown Source)
          at java.base@11.0.9/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base@11.0.9/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)    
      Number of locked synchronizers = 2
          - java.util.concurrent.ThreadPoolExecutor$Worker@ffceecd
          - java.util.concurrent.locks.ReentrantLock$NonfairSync@561fd6c1
      
      "Thread-142 (ActiveMQ-client-global-threads)" Id=10066 BLOCKED on org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573 owned by "Thread-16 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@46cc127b)" Id=82
          at org.apache.activemq.artemis.core.server.impl.QueueImpl.iterQueue(QueueImpl.java:2158)
          -  blocked on org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
          at org.apache.activemq.artemis.core.server.impl.QueueImpl.moveReferencesBetweenSnFQueues(QueueImpl.java:2649)
          at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.scaleDown(BridgeImpl.java:746)
          -  locked org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
          at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.connectionFailed(BridgeImpl.java:728)
          at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.callSessionFailureListeners(ClientSessionFactoryImpl.java:774)
          at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:709)
          at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:544)
          at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$600(ClientSessionFactoryImpl.java:75)
          at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1317)
          at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.callFailureListeners(AbstractRemotingConnection.java:78)
          at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:222)
          at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$CloseRunnable.run(ClientSessionFactoryImpl.java:1091)
          at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
          at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
          at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
          at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$134/0x00000008002b5840.run(Unknown Source)
          at java.base@11.0.9/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base@11.0.9/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)    
      Number of locked synchronizers = 3
          - java.util.concurrent.ThreadPoolExecutor$Worker@21768e7
          - java.util.concurrent.locks.ReentrantLock$NonfairSync@32848485
          - java.util.concurrent.locks.ReentrantLock$NonfairSync@6efeeadb

      In attachments, added logs of a restarting broker and logs of a falling broker.

      The broker fell two minutes after the restart.

       

      Attachments

        1. fallen_broker_logs.txt
          134 kB
          Alexander
        2. restarted_broker_logs.txt
          20 kB
          Alexander

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Luchkin Alexander
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: