[QPID-8489] Connection thread looping - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: qpid-java-broker-8.0.2
Fix Version/s: qpid-java-broker-8.0.3, qpid-java-broker-7.1.11
Component/s: Broker-J
Labels:
None

Description

Error happens quite rarely and is not easy to reproduce. Although the problem was partly fixed by fixing ~~QPID-8477~~, it still can be reproduced. The main symptom is significant increase of CPU usage even when no messages are sent to broker anymore. CPU usage can rise from 30% to 90% and higher, making broker unusable. After such CPU rise the only way to fix broker will be restarting it.

Analysis has shown, that error occurs with CPP proton client in cases when

1) SSL connection is used

2) connection errors on client side are ignored

3) connection is dropped due to the client process termination / network disconnection

Steps to reproduce

Java broker should be installed
Broker should be configured to allow one connection
Prepare certificates
Install Qpid::Proton 0.28.0

wget http://archive.apache.org/dist/qpid/proton/0.28.0/qpid-proton-0.28.0.tar.gz

gunzip qpid-proton-0.28.0.tar.gz

mkdir -p qpid-proton-0.28.0/build && pushd qpid-proton-0.28.0/build && cmake .. && make all && popd

Replace and edit example qpid-proton-0.28.0/cpp/examples/simple_recv.cpp with the one attached

Build again

cd qpid-proton-0.28.0/build

make

Break the broker

./cpp/examples/simple_recv & ./cpp/examples/simple_recv

Connection error

^C <= Hit Ctrl+C to kill process

If CPU usage didn't increased, find the PID of the first simple_recv process using ps-ef | grep simple_recv and kill it using kill -9 PID.

Analysis

CPU usage rises when connection is dropped on the client side or when network is broken between client and broker. The main point is that client isn't well behaved and connection shouldn't be closed correctly.

On broker side connection becomes "orphaned": it is still maintained by broker, but no real reading / writing is performed. Following method calls are performed in an endless loop for each "orphaned" connection:

SelectorThread.performSelect()

SelectorThread.ConnectionProcessor.processConnection()

NetworkConnectionScheduler.processConnection()

NonBlockingConnection.doWork()

As there nothing physically read or written, both methods NonBlockingConnection.doRead() and NonBlockingConnection.doWrite() execute very fast (several milliseconds) without any blocking processes and after that connection is immediately rescheduled for processing in NetworkConnectionScheduler. After that loop repeats.

As the connection lifecycle is normal, there is logged nothing unusual or suspicious (nothing is seen in log at all).

In thread dump (see attachment) there is seen, that utilized are mostly thread with names virtualhost-default-iopool-XX. Typical stacktrace looks like following:

"virtualhost-default-iopool-39" #92 prio=5 os_prio=0 tid=0x00007f47ec335800 nid=0x37196 waiting on condition [0x00007f476a4e3000]"virtualhost-default-iopool-39" #92 prio=5 os_prio=0 tid=0x00007f47ec335800 nid=0x37196 waiting on condition [0x00007f476a4e3000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000f39105d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:532) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464) at org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$18/422330142.run(Unknown Source) at java.lang.Thread.run(Thread.java:748)}

The main symptom of an error is rising CPU usage, which can reach up to 90% in case, when several connections are "orphaned". Additional factor leading to the problem is disabled keep-alive option for a connection or long keep-alive interval.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

0001-QPID-8489-Break-an-infinite-connection-processing-lo.patch
10/Dec/20 16:16
2 kB
Alex Rudyy
QPID-8489.log
07/Dec/20 10:04
188 kB
Daniil Kirilyuk
QPID-8489 - java.ssl.debug.log
10/Dec/20 13:53
6 kB
Daniil Kirilyuk
simple_recv.cpp
30/Nov/20 09:05
2 kB
Daniil Kirilyuk
thread-dump.st
30/Nov/20 09:05
130 kB
Daniil Kirilyuk

Issue Links

causes

QPID-8491 Connection leak when using CPP qpid-proton client

Closed

Connection thread looping

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates