Details
Description
Please have a look at the attached stress test. It consists of 12 reader threads, that create and destroy cms::MessageConsumers in a loop, and 4 writer threads, that send cms::TextMessages in a loop. Reader threads deadlock in less than 1 minute on my machine. To run it, simply issue
make test
I traced the issue down to inconsistent mutex acquisition order by the following two threads:
thread 7 (Thread 0x7fa691fce700 (LWP 28088))
(gdb) bt
#0 0x00007fa68eb405bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fa68fea1c06 in (anonymous namespace)::doMonitorEnter (monitor=0x7fa668007ff0, thread=0x7fa64c0071d0) at decaf/internal/util/concurrent/Threading.cpp:664
#2 0x00007fa68ff15dcb in decaf::util::concurrent::Lock::lock (this=0x7fa691fcd6b0) at decaf/util/concurrent/Lock.cpp:54
#3 0x00007fa68ff15ee5 in decaf::util::concurrent::Lock::Lock (this=<value optimized out>, object=<value optimized out>, intiallyLocked=<value optimized out>)
at decaf/util/concurrent/Lock.cpp:32
#4 0x00007fa68fc4da38 in activemq::core::kernels::ActiveMQConsumerKernel::dispatch (this=0x7fa680010a20, dispatch=...) at activemq/core/kernels/ActiveMQConsumerKernel.cpp:1527
#5 0x00007fa68fc06584 in activemq::core::ActiveMQSessionExecutor::dispatch (this=0x7fa65c005300, dispatch=...) at activemq/core/ActiveMQSessionExecutor.cpp:156
#6 0x00007fa68fc06f15 in activemq::core::ActiveMQSessionExecutor::iterate (this=0x7fa65c005300) at activemq/core/ActiveMQSessionExecutor.cpp:181
#7 0x00007fa68fd3fdf5 in activemq::threads::DedicatedTaskRunner::run (this=0x7fa64c004ab0) at activemq/threads/DedicatedTaskRunner.cpp:141Waiting for internal->listenerMutex, that is held by thread 6
Acquisition order:
ActiveMQConsumerKernel::internal->unconsumedMessages
ActiveMQConsumerKernel::internal->listenerMutex(gdb) frame 1
(gdb) p/x monitor->owner->handle
$9 = 0x7fa691fbd700thread 6 (Thread 0x7fa691fbd700 (LWP 28091))
(gdb) bt
#0 0x00007fa68eb405bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fa68fea1c06 in (anonymous namespace)::doMonitorEnter (monitor=0x7fa6680071e0, thread=0x7fa6246a13d0) at decaf/internal/util/concurrent/Threading.cpp:664
#2 0x00007fa68ff15dcb in decaf::util::concurrent::Lock::lock (this=0x7fa691fbc6f0) at decaf/util/concurrent/Lock.cpp:54
#3 0x00007fa68ff15ee5 in decaf::util::concurrent::Lock::Lock (this=<value optimized out>, object=<value optimized out>, intiallyLocked=<value optimized out>)
at decaf/util/concurrent/Lock.cpp:32
#4 0x00007fa68fc3c21a in activemq::core::SimplePriorityMessageDispatchChannel::dequeueNoWait (this=0x7fa68000eee0) at activemq/core/SimplePriorityMessageDispatchChannel.cpp:95
#5 0x00007fa68fc40f3c in activemq::core::kernels::ActiveMQConsumerKernel::iterate (this=0x7fa680010a20) at activemq/core/kernels/ActiveMQConsumerKernel.cpp:1701
#6 0x00007fa68fc8ab23 in activemq::core::kernels::ActiveMQSessionKernel::iterateConsumers (this=0x7fa65c003960) at activemq/core/kernels/ActiveMQSessionKernel.cpp:1370
#7 0x00007fa68fc06eb9 in activemq::core::ActiveMQSessionExecutor::iterate (this=0x7fa65c005300) at activemq/core/ActiveMQSessionExecutor.cpp:173
#8 0x00007fa68fd3fdf5 in activemq::threads::DedicatedTaskRunner::run (this=0x7fa624603e00) at activemq/threads/DedicatedTaskRunner.cpp:141Waiting for mutex, that is held by thread 7
Acquisition order:
ActiveMQSessionKernel::config->consumerLock
ActiveMQConsumerKernel::internal->listenerMutex
ActiveMQConsumerKernel::internal->unconsumedMessages(gdb) frame 1
(gdb) p/x monitor->owner->handle
$10 = 0x7fa691fce700
With the attached patch, the stress test no longer deadlocks. However, it starts consuming memory in BitSet::ensureCapacity() via ActiveMQConnection::isDuplicate(). See attached massif output for details.