Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-8514

[Broker-J] High CPU usage and stucked connections

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • qpid-java-broker-8.0.4
    • qpid-java-broker-8.0.5
    • Broker-J
    • None

    Description

      There were around 600 connections in state CLOSE__WAIT: 

      [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep TIME_WAIT | wc -l
      38
      [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep CLOSE_WAIT | wc -l
      625
      [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep -v CLOSE_WAIT | grep -V TIME_WAIT | wc -l
      7
      

      CPU usage raises up to 90% and higher.

      Connections couldn't be closed by webgui. Broker log contains message below, but connections were not closed.

      2021-03-26 15:12:28,474 INFO  [Broker-Config] (q.m.c.model_delete) - [mng:3hpU+miq(admin@/127.0.0.1:62743)] [con:751(TEST@/127.0.0.1:40422/default)] CON-1007 : Connection close initiated by operator
      

       Thread dumps shows around 49 threads with same stacktrace related to ring queue

      java.lang.Thread.State: RUNNABLE
      	at org.apache.qpid.server.queue.QueueEntryImpl.addStateChangeListener(QueueEntryImpl.java:674)
      	at org.apache.qpid.server.queue.QueueEntryImpl.acquireOrSteal(QueueEntryImpl.java:316)
      	at org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1913)
      	at org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1863)
      	at org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.checkOverflow(RingOverflowPolicyHandler.java:102)
      	at org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.access$000(RingOverflowPolicyHandler.java:44)
      	at org.apache.qpid.server.queue.RingOverflowPolicyHandler.checkOverflow(RingOverflowPolicyHandler.java:41)
      	at org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1382)
      	at org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1290)
      	at org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135)
      	at org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309)
      	at org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84)
      	at org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588)
      	at org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1343)
      	at org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$142/959267941.accept(Unknown Source)
      	at java.lang.Iterable.forEach(Iterable.java:75)
      	at org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1343)
      	at org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1349)
      	at org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$141/1774823837.run(Unknown Source)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1347)
      	at org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1328)
      	at org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
      	at org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$119/6380061.run(Unknown Source)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
      	at org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
      	at org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
      	at org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
      	at org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
      	at org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
      	at org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
      	at org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
      	at org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
      	at org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
      	at org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
      	at org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1487500813.run(Unknown Source)
      	at java.lang.Thread.run(Thread.java:748)
      

      Also ring queue behaves strangely: queue "DeadLetterQueue" has set maximum message depth to 1, but in there were 55 message with Acquired state.
      First message in DeadLetterQueue - 2021-03-25, 16:50:59.381 (00:00 UTC).
      Last message in DeadLetterQueue - 2021-03-25, 16:50:59.683 (00:00 UTC).

      These times correlates to timed out drain request at 16:52:00,410, because default drain request is set 60 seconds:

      ^[[0m^[[0m2021-03-25 16:52:00,408 INFO  [org.apache.qpid.jms.JmsSession] (QpidJMS Connection Executor: ID:986cd915-d072-4711-bd71-50a74aab4be8:1) A JMS MessageConsumer has been closed: JmsConsumerInfo: { ID:986cd915-d072-4711-bd71-50a74aab4be8:1:1:1, destination = broadcast.XX.PRISMA_LIVERiskLimitTrigger }
      ^[[0m^[[33m2021-03-25 16:52:00,410 WARN  [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel (camel-1) thread #3 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) Setup of JMS message listener invoker failed for destination 'broadcast.XX.PRISMA_LIVERiskLimitTrigger' - trying to recover. Cause: Remote did not respond to a drain request in time
      ^[[0m^[[33m2021-03-25 16:52:01,414 WARN  [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel (camel-1) thread #6 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) Setup of JMS message listener invoker failed for destination 'broadcast.XX.PRISMA_LIVERiskLimitTrigger'
      

      After the problem was triggered broker started to reject client connections:

       2021-03-25 17:55:43,716 INFO  [Broker-Config] (q.m.c.open) - [con:901(/127.0.0.1:56012)] CON-1001 : Open : Destination : amqp(127.0.0.1:22101) : Protocol Version : 1.0
      2021-03-25 17:55:43,745 INFO  [IO-/127.0.0.1:56012] (q.m.c.open) - [con:901(ARP@/127.0.0.1:56012/default)] CON-1001 : Open : Destination : amqp(172.16.75.197:22101) : Protocol Version : 1.0 : Client ID : ID:af50d087-0610-406a-9792-ffa5bf6b68b4:5 : Client Version : 0.51.0 : Client Product : QpidJMS
      2021-03-25 17:55:43,747 INFO  [IO-/127.0.0.1:56012] (q.m.a.denied) - [con:901(ARP@/127.0.0.1:56012/default)] ACL-1002 : Denied : Access Virtualhost NAME=default,VIRTUALHOST_NAME=default
      2021-03-25 17:55:43,755 INFO  [Broker-Config] (q.m.c.dropped_connection) - [con:901(ARP@/127.0.0.1:56012/default)] CON-1004 : Connection dropped
      

      Issue seems to be quite rare: same test ran in a loop over several days and this issue was not triggered.

      Attachments

        1. threaddump.zip
          66 kB
          Daniil Kirilyuk
        2. RingPolicyConcurrencyTest.java
          10 kB
          Alex Rudyy
        3. QPID-8514 - broker-log.zip
          63 kB
          Daniil Kirilyuk

        Activity

          People

            orudyy Alex Rudyy
            daniel.kirilyuk Daniil Kirilyuk
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: