Uploaded image for project: 'ActiveMQ C++ Client'
  1. ActiveMQ C++ Client
  2. AMQCPP-376

Deadlock in IOTransport when network of brokers restart and failover is used.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.4.0
    • Fix Version/s: 3.5.0
    • Component/s: Other C++ Clients
    • Labels:
      None
    • Environment:

      ActiveMQ-CPP ver - 3.4.0
      Broker 5.3.1
      Machine: Linux mars 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
      gcc version: 4.1.2 20080704 (Red Hat 4.1.2-44))

      Description

      The problem description:
      We run Network of brokers ( 4 in number ) .
      Broker URI : broker URI 'failover://(tcp://10.10.13.20:61616,tcp://10.10.13.22:61616,tcp://10.10.13.24:61616,tcp://10.10.13.26:61616)?randomize=true&connection.closeTimeout=10000&transport.soTimeout=3000&timeout=3000&connection.useAsyncSend=true&connection.alwaysSyncSend=false'

      Producer loads broker with 1000 message/sec . We testing the producer behavior while failover by restarting all brokers in row ( all 4 ) while sending the messages and get deadlock as shown below .

      Note: the problem tested only with network on brokers .

      The backtrace ( only relevant threads ):

      Thread 16 (process 26892):
      #0 0x00000032ef00ce74 in __lll_lock_wait () from /lib64/libpthread.so.0
      #1 0x00000032ef008874 in _L_lock_106 () from /lib64/libpthread.so.0
      #2 0x00000032ef0082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
      #3 0x0000000000dc5a04 in decaf::internal::util::concurrent::MutexImpl::lock (handle=0xfefdd38) at decaf/internal/util/concurrent/unix/MutexImpl.cpp:77
      #4 0x0000000000bd9092 in decaf::util::concurrent::Mutex::lock (this=0xff54100) at decaf/util/concurrent/Mutex.cpp:111
      #5 0x0000000000d51f3f in decaf::util::AbstractCollection<decaf::lang::Pointer<activemq::transport::Transport, decaf::util::concurrent::atomic::AtomicRefCounter> >::lock (this=0xff540f8) at ./decaf/util/AbstractCollection.h:331
      #6 0x0000000000bd86c9 in decaf::util::concurrent::Lock::lock (this=0x4c7b9c90) at decaf/util/concurrent/Lock.cpp:54
      #7 0x0000000000bd883a in Lock (this=0x4c7b9c90, object=0xff54188, intiallyLocked=true) at decaf/util/concurrent/Lock.cpp:32
      #8 0x0000000000d47a77 in activemq::transport::failover::CloseTransportsTask::add (this=0xff540e8, transport=@0x4c7b9cf0) at activemq/transport/failover/CloseTransportsTask.cpp:46
      #9 0x0000000000b1b748 in activemq::transport::failover::FailoverTransport::handleTransportFailure (this=0xffed498, error=@0x4c7b9ee0) at activemq/transport/failover/FailoverTransport.cpp:483
      #10 0x0000000000b41a06 in activemq::transport::failover::FailoverTransportListener::onException (this=0xfde2e58, ex=@0x4c7b9ee0) at activemq/transport/failover/FailoverTransportListener.cpp:76
      #11 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0x10627498, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
      #12 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0x10627498, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
      #13 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
      #14 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
      #15 0x0000000000d554c8 in activemq::transport::inactivity::InactivityMonitor::onException (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/inactivity/InactivityMonitor.cpp:312
      #16 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0x1020c118, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
      #17 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0x1020c118, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
      #18 0x0000000000d326f2 in activemq::transport::IOTransport::fire (this=0xdce10b8, ex=@0x4c7b9ee0) at activemq/transport/IOTransport.cpp:87
      #19 0x0000000000d32982 in activemq::transport::IOTransport::run (this=0xdce10b8) at activemq/transport/IOTransport.cpp:264
      #20 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback (properties=0x105871d8) at decaf/lang/Thread.cpp:137
      #21 0x0000000000ba9068 in threadWorker (arg=0x105871d8) at decaf/lang/Thread.cpp:190
      #22 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
      #23 0x00000032ee4d30ad in clone () from /lib64/libc.so.6

      Thread 9 (process 14470):
      #0 0x00000032ef00a899 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1 0x0000000000dc54b3 in decaf::internal::util::concurrent::ConditionImpl::wait (condition=0x1072d2b8) at decaf/internal/util/concurrent/unix/ConditionImpl.cpp:101
      #2 0x0000000000bd9033 in decaf::util::concurrent::Mutex::wait (this=0x105871d8) at decaf/util/concurrent/Mutex.cpp:126
      #3 0x0000000000ba8538 in decaf::lang::Thread::join (this=0x12a4a418) at decaf/lang/Thread.cpp:452
      #4 0x0000000000d32c28 in activemq::transport::IOTransport::close (this=0xdce10b8) at activemq/transport/IOTransport.cpp:222
      #5 0x0000000000d34bfe in activemq::transport::TransportFilter::close (this=0x1020c118) at activemq/transport/TransportFilter.cpp:106
      #6 0x0000000000b47d3a in activemq::transport::tcp::TcpTransport::close (this=0x1020c118) at activemq/transport/tcp/TcpTransport.cpp:74
      #7 0x0000000000d34bfe in activemq::transport::TransportFilter::close (this=0xfeeb558) at activemq/transport/TransportFilter.cpp:106
      #8 0x0000000000d554ec in activemq::transport::inactivity::InactivityMonitor::close (this=0xfeeb558) at activemq/transport/inactivity/InactivityMonitor.cpp:300
      #9 0x0000000000d77867 in activemq::wireformat::openwire::OpenWireFormatNegotiator::close (this=0x10627498) at activemq/wireformat/openwire/OpenWireFormatNegotiator.cpp:248
      #10 0x0000000000d478ff in activemq::transport::failover::CloseTransportsTask::iterate (this=0xff540e8) at activemq/transport/failover/CloseTransportsTask.cpp:75
      #11 0x0000000000d25891 in activemq::threads::CompositeTaskRunner::iterate (this=0xddc0108) at activemq/threads/CompositeTaskRunner.cpp:173
      #12 0x0000000000d25ae4 in activemq::threads::CompositeTaskRunner::run (this=0xddc0108) at activemq/threads/CompositeTaskRunner.cpp:107
      #13 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback (properties=0xfeeb2b8) at decaf/lang/Thread.cpp:137
      #14 0x0000000000ba9068 in threadWorker (arg=0xfeeb2b8) at decaf/lang/Thread.cpp:190
      #15 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
      #16 0x00000032ee4d30ad in clone () from /lib64/libc.so.6

      As you can see Thread 16 is on lock_wait for synchronized( &transports ) in activemq::transport::failover::CloseTransportsTask::add .

      The synchronized( &transports ) in locked by Thread 9 in activemq::threads::CompositeTaskRunner::iterate . But Thread 9 is on pthread_cond_wait which has to be signalled by the Thread 16.

      Kind regards .
      Igor.

        Attachments

        1. bt_1.txt
          29 kB
          igor khaustov
        2. bt_2.txt
          37 kB
          igor khaustov
        3. activemq35g0.patch
          6 kB
          Bob Wiegand
        4. 35g0.txt
          0.6 kB
          Bob Wiegand

          Activity

            People

            • Assignee:
              tabish Timothy A. Bish
              Reporter:
              igorkh igor khaustov
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: