Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
qpid-java-broker-8.0.4
-
None
Description
There were around 600 connections in state CLOSE__WAIT:
[prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep TIME_WAIT | wc -l 38 [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep CLOSE_WAIT | wc -l 625 [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep -v CLOSE_WAIT | grep -V TIME_WAIT | wc -l 7
CPU usage raises up to 90% and higher.
Connections couldn't be closed by webgui. Broker log contains message below, but connections were not closed.
2021-03-26 15:12:28,474 INFO [Broker-Config] (q.m.c.model_delete) - [mng:3hpU+miq(admin@/127.0.0.1:62743)] [con:751(TEST@/127.0.0.1:40422/default)] CON-1007 : Connection close initiated by operator
Thread dumps shows around 49 threads with same stacktrace related to ring queue
java.lang.Thread.State: RUNNABLE at org.apache.qpid.server.queue.QueueEntryImpl.addStateChangeListener(QueueEntryImpl.java:674) at org.apache.qpid.server.queue.QueueEntryImpl.acquireOrSteal(QueueEntryImpl.java:316) at org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1913) at org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1863) at org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.checkOverflow(RingOverflowPolicyHandler.java:102) at org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.access$000(RingOverflowPolicyHandler.java:44) at org.apache.qpid.server.queue.RingOverflowPolicyHandler.checkOverflow(RingOverflowPolicyHandler.java:41) at org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1382) at org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1290) at org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135) at org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309) at org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84) at org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588) at org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1343) at org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$142/959267941.accept(Unknown Source) at java.lang.Iterable.forEach(Iterable.java:75) at org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1343) at org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1349) at org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$141/1774823837.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1347) at org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1328) at org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576) at org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$119/6380061.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571) at org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135) at org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611) at org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58) at org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496) at org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270) at org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134) at org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575) at org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366) at org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97) at org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464) at org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1487500813.run(Unknown Source) at java.lang.Thread.run(Thread.java:748)
Also ring queue behaves strangely: queue "DeadLetterQueue" has set maximum message depth to 1, but in there were 55 message with Acquired state.
First message in DeadLetterQueue - 2021-03-25, 16:50:59.381 (00:00 UTC).
Last message in DeadLetterQueue - 2021-03-25, 16:50:59.683 (00:00 UTC).
These times correlates to timed out drain request at 16:52:00,410, because default drain request is set 60 seconds:
^[[0m^[[0m2021-03-25 16:52:00,408 INFO [org.apache.qpid.jms.JmsSession] (QpidJMS Connection Executor: ID:986cd915-d072-4711-bd71-50a74aab4be8:1) A JMS MessageConsumer has been closed: JmsConsumerInfo: { ID:986cd915-d072-4711-bd71-50a74aab4be8:1:1:1, destination = broadcast.XX.PRISMA_LIVERiskLimitTrigger } ^[[0m^[[33m2021-03-25 16:52:00,410 WARN [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel (camel-1) thread #3 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) Setup of JMS message listener invoker failed for destination 'broadcast.XX.PRISMA_LIVERiskLimitTrigger' - trying to recover. Cause: Remote did not respond to a drain request in time ^[[0m^[[33m2021-03-25 16:52:01,414 WARN [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel (camel-1) thread #6 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) Setup of JMS message listener invoker failed for destination 'broadcast.XX.PRISMA_LIVERiskLimitTrigger'
After the problem was triggered broker started to reject client connections:
2021-03-25 17:55:43,716 INFO [Broker-Config] (q.m.c.open) - [con:901(/127.0.0.1:56012)] CON-1001 : Open : Destination : amqp(127.0.0.1:22101) : Protocol Version : 1.0 2021-03-25 17:55:43,745 INFO [IO-/127.0.0.1:56012] (q.m.c.open) - [con:901(ARP@/127.0.0.1:56012/default)] CON-1001 : Open : Destination : amqp(172.16.75.197:22101) : Protocol Version : 1.0 : Client ID : ID:af50d087-0610-406a-9792-ffa5bf6b68b4:5 : Client Version : 0.51.0 : Client Product : QpidJMS 2021-03-25 17:55:43,747 INFO [IO-/127.0.0.1:56012] (q.m.a.denied) - [con:901(ARP@/127.0.0.1:56012/default)] ACL-1002 : Denied : Access Virtualhost NAME=default,VIRTUALHOST_NAME=default 2021-03-25 17:55:43,755 INFO [Broker-Config] (q.m.c.dropped_connection) - [con:901(ARP@/127.0.0.1:56012/default)] CON-1004 : Connection dropped
Issue seems to be quite rare: same test ran in a loop over several days and this issue was not triggered.