Qpid
  1. Qpid
  2. QPID-2674

heartbeats can cause seg faults under tcp on linux

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8
    • Component/s: C++ Broker, C++ Client
    • Labels:
      None

      Description

      If there is a queued callback that is processed during a writeable
      event (or read-writeable) the writeable callback, i.e. AsynchIO::writeable(), takes the path through
      line 536 of AsyncIO.cpp, then AsynchIOHandler::closedSocket() is called
      which deletes the AsynchIOHandler instance, the thread then goes on to process the callbacks in DipatchHandle, which invoke on the now deleted AsynchIOHandler.

      Results in cores like:

      Core was generated by `/home/gordon/qpid/cpp/build/src/.libs/lt-qpidd
      --load-module /home/gordon/qpid/'.
      Program terminated with signal 11, Segmentation fault.
      [New process 9483]
      [New process 9516]
      [New process 9490]
      [New process 9489]
      [New process 9488]
      [New process 9487]
      [New process 9486]
      [New process 9484]
      #0 0x00002aaefd5e999d in qpid::sys::AsynchIOHandler::disconnect
      (this=0x1a4f0880) at ../../src/qpid/sys/AsynchIOHandler.cpp:194
      194 if (codec) codec->closed();
      (gdb) bt
      #0 0x00002aaefd5e999d in qpid::sys::AsynchIOHandler::disconnect
      (this=0x1a4f0880) at ../../src/qpid/sys/AsynchIOHandler.cpp:194
      #1 0x00002aaefd5e9c29 in qpid::sys::AsynchIOHandler::eof (this=0x2aaab1b330c0,
      a=@0x1a8778a0) at ../../src/qpid/sys/AsynchIOHandler.cpp:177
      #2 0x00002aaefd51ccbf in boost::function1<void, qpid::sys::AsynchIO&,
      std::allocator<boost::function_base> >::operator() (this=0x2aaefd5e9c20,
      a0=@0x1a8778a0) at /usr/include/boost/function/function_template.hpp:576
      #3 0x00002aaefd51c873 in
      boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void,
      boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, boost::function1<void,
      qpid::sys::AsynchIO&, std::allocator<boost::function_base> > >,
      boost::_bi::list2<boost::_bi::value<qpid::sys::posix::AsynchIO*>,
      boost::_bi::value<boost::function1<void, qpid::sys::AsynchIO&,
      std::allocator<boost::function_base> > > > >, void,
      qpid::sys::DispatchHandle&>::invoke (function_obj_ptr=<value optimized out>,
      a0=<value optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:149
      #4 0x00002aaefd5ee49f in boost::function1<void, qpid::sys::DispatchHandle&,
      std::allocator<boost::function_base> >::operator() (this=0x2aaefd5e9c20,
      a0=@0x1a8778a0) at /usr/include/boost/function/function_template.hpp:576
      #5 0x00002aaefd5ed2b6 in qpid::sys::DispatchHandle::processEvent
      (this=0x1a8778a8, type=qpid::sys::Poller::READ_WRITABLE)
      at ../../src/qpid/sys/DispatchHandle.cpp:309
      #6 0x00002aaefd526e98 in qpid::sys::Poller::run (this=0x19b4ee00) at
      ../../src/qpid/sys/Poller.h:125
      #7 0x00002aaefd0844e4 in qpid::broker::Broker::run (this=<value optimized
      out>) at ../../src/qpid/broker/Broker.cpp:339
      #8 0x0000000000409014 in QpiddDaemon::child (this=0x7fff989850d0) at
      ../../src/posix/QpiddBroker.cpp:130
      #9 0x00002aaefd0a482f in qpid::broker::Daemon::fork (this=0x7fff989850d0) at
      ../../src/qpid/broker/Daemon.cpp:91
      #10 0x0000000000407585 in QpiddBroker::execute (this=<value optimized out>,
      options=<value optimized out>) at ../../src/posix/QpiddBroker.cpp:168
      #11 0x00000000004057ff in main (argc=12, argv=0x7fff989856b8) at
      ../../src/qpidd.cpp:80
      (gdb) print codec
      $1 = (class qpid::sys::ConnectionCodec *) 0x2aaab1b330c0
      (gdb) thread apply all bt

      Thread 8 (process 9484):
      #0 0x00000038f9c92149 in strftime_l () from /lib64/libc.so.6
      #1 0x00002aaefd51f111 in qpid::sys::outputFormattedNow (o=@0x413214b0) at
      ../../src/qpid/sys/posix/Time.cpp:89
      #2 0x00002aaefd5da5a1 in qpid::log::Logger::log (this=0x2aaefd895680,
      s=@0x2aaefd894d80, msg=@0x41321770) at ../../src/qpid/log/Logger.cpp:77
      #3 0x00002aaefd5dfa9c in qpid::log::Statement::log (this=0x2aaefd894d80,
      message=@0x41321f90) at ../../src/qpid/log/Statement.cpp:57
      #4 0x00002aaefd5f3b4b in qpid::sys::Timer::run (this=0x19b59b40) at
      ../../src/qpid/sys/Timer.cpp:129
      #5 0x00002aaefd51f14a in runRunnable (p=0x413213f0) at
      ../../src/qpid/sys/posix/Thread.cpp:35
      #6 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0
      #7 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6

      Thread 7 (process 9486):
      #0 0x00000038fa40ad09 in pthread_cond_wait@@GLIBC_2.3.2 () from
      /lib64/libpthread.so.0
      #1 0x00002aaefd5f3b73 in qpid::sys::Timer::run (this=0x19b5f490) at
      ../../include/qpid/sys/posix/Condition.h:63
      #2 0x00002aaefd51f14a in runRunnable (p=0x19b5f4c4) at
      ../../src/qpid/sys/posix/Thread.cpp:35
      #3 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0
      #4 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6

      Thread 6 (process 9487):
      #0 0x00000038f9cd4018 in epoll_wait () from /lib64/libc.so.6
      #1 0x00002aaefd526343 in qpid::sys::Poller::wait (this=0x19b4ee00,
      timeout=<value optimized out>) at ../../src/qpid/sys/epoll/EpollPoller.cpp:570
      #2 0x00002aaefd526ea7 in qpid::sys::Poller::run (this=0x19b4ee00) at
      ../../src/qpid/sys/epoll/EpollPoller.cpp:517
      #3 0x00002aaefd51f14a in runRunnable (p=0x5) at
      ../../src/qpid/sys/posix/Thread.cpp:35
      #4 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0
      #5 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6

      Thread 5 (process 9488):
      #0 0x00000038f9cd3f9a in epoll_ctl () from /lib64/libc.so.6
      #1 0x00002aaefd525fd7 in qpid::sys::PollerPrivate::resetMode (this=0x19b593a0,
      eh=@0x2aaab4090920) at ../../src/qpid/sys/epoll/EpollPoller.cpp:389
      #2 0x00002aaefd52624d in qpid::sys::Poller::wait (this=0x19b4ee00,
      timeout=<value optimized out>) at ../../src/qpid/sys/epoll/EpollPoller.cpp:558
      #3 0x00002aaefd526ea7 in qpid::sys::Poller::run (this=0x19b4ee00) at
      ../../src/qpid/sys/epoll/EpollPoller.cpp:517
      #4 0x00002aaefd51f14a in runRunnable (p=0x5) at
      ../../src/qpid/sys/posix/Thread.cpp:35
      #5 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0
      #6 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6

      Thread 4 (process 9489):
      #0 0x00000038f9cd447b in accept () from /lib64/libc.so.6
      #1 0x00002aaefd511ca4 in qpid::sys::Socket::accept (this=<value optimized
      out>) at ../../src/qpid/sys/posix/Socket.cpp:215
      #2 0x00002aaefd51a085 in qpid::sys::posix::AsynchAcceptor::readable
      (this=0x19b634b0, h=@0x19b634d0) at ../../src/qpid/sys/posix/AsynchIO.cpp:121
      #3 0x00002aaefd5ee49f in boost::function1<void, qpid::sys::DispatchHandle&,
      std::allocator<boost::function_base> >::operator() (this=0x157, a0=@0x0)
      at /usr/include/boost/function/function_template.hpp:576
      --Type <return> to continue, or q <return> to quit--
      #4 0x00002aaefd5ed1f9 in qpid::sys::DispatchHandle::processEvent
      (this=0x19b634d0, type=qpid::sys::Poller::READABLE)
      at ../../src/qpid/sys/DispatchHandle.cpp:278
      #5 0x00002aaefd526e98 in qpid::sys::Poller::run (this=0x19b4ee00) at
      ../../src/qpid/sys/Poller.h:125
      #6 0x00002aaefd51f14a in runRunnable (p=0x13) at
      ../../src/qpid/sys/posix/Thread.cpp:35
      #7 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0
      #8 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6

      Thread 3 (process 9490):
      #0 0x00000038f9cd3f9a in epoll_ctl () from /lib64/libc.so.6
      #1 0x00002aaefd525fd7 in qpid::sys::PollerPrivate::resetMode (this=0x19b593a0,
      eh=@0x2aaab40dc910) at ../../src/qpid/sys/epoll/EpollPoller.cpp:389
      #2 0x00002aaefd52624d in qpid::sys::Poller::wait (this=0x19b4ee00,
      timeout=<value optimized out>) at ../../src/qpid/sys/epoll/EpollPoller.cpp:558
      #3 0x00002aaefd526ea7 in qpid::sys::Poller::run (this=0x19b4ee00) at
      ../../src/qpid/sys/epoll/EpollPoller.cpp:517
      #4 0x00002aaefd51f14a in runRunnable (p=0x5) at
      ../../src/qpid/sys/posix/Thread.cpp:35
      #5 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0
      #6 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6

      Thread 2 (process 9516):
      #0 0x00000038f9cd4018 in epoll_wait () from /lib64/libc.so.6
      #1 0x00002aaefd526343 in qpid::sys::Poller::wait (this=0x19b65e60,
      timeout=<value optimized out>) at ../../src/qpid/sys/epoll/EpollPoller.cpp:570
      #2 0x00002aaefd526ea7 in qpid::sys::Poller::run (this=0x19b65e60) at
      ../../src/qpid/sys/epoll/EpollPoller.cpp:517
      #3 0x00002aaefd51f14a in runRunnable (p=0x14) at
      ../../src/qpid/sys/posix/Thread.cpp:35
      #4 0x00000038fa406617 in start_thread () from /lib64/libpthread.so.0
      #5 0x00000038f9cd3c2d in clone () from /lib64/libc.so.6

      Thread 1 (process 9483):
      #0 0x00002aaefd5e999d in qpid::sys::AsynchIOHandler::disconnect
      (this=0x1a4f0880) at ../../src/qpid/sys/AsynchIOHandler.cpp:194
      #1 0x00002aaefd5e9c29 in qpid::sys::AsynchIOHandler::eof (this=0x2aaab1b330c0,
      a=@0x1a8778a0) at ../../src/qpid/sys/AsynchIOHandler.cpp:177
      #2 0x00002aaefd51ccbf in boost::function1<void, qpid::sys::AsynchIO&,
      std::allocator<boost::function_base> >::operator() (this=0x2aaefd5e9c20,
      a0=@0x1a8778a0) at /usr/include/boost/function/function_template.hpp:576
      #3 0x00002aaefd51c873 in
      boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void,
      boost::_mfi::mf1<void, qpid::sys::posix::AsynchIO, boost::function1<void,
      qpid::sys::AsynchIO&, std::allocator<boost::function_base> > >,
      boost::_bi::list2<boost::_bi::value<qpid::sys::posix::AsynchIO*>,
      boost::_bi::value<boost::function1<void, qpid::sys::AsynchIO&,
      std::allocator<boost::function_base> > > > >, void,
      qpid::sys::DispatchHandle&>::invoke (function_obj_ptr=<value optimized out>,
      a0=<value optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:149
      #4 0x00002aaefd5ee49f in boost::function1<void, qpid::sys::DispatchHandle&,
      std::allocator<boost::function_base> >::operator() (this=0x2aaefd5e9c20,
      a0=@0x1a8778a0) at /usr/include/boost/function/function_template.hpp:576
      #5 0x00002aaefd5ed2b6 in qpid::sys::DispatchHandle::processEvent
      (this=0x1a8778a8, type=qpid::sys::Poller::READ_WRITABLE)
      at ../../src/qpid/sys/DispatchHandle.cpp:309
      #6 0x00002aaefd526e98 in qpid::sys::Poller::run (this=0x19b4ee00) at
      ../../src/qpid/sys/Poller.h:125
      #7 0x00002aaefd0844e4 in qpid::broker::Broker::run (this=<value optimized
      out>) at ../../src/qpid/broker/Broker.cpp:339
      #8 0x0000000000409014 in QpiddDaemon::child (this=0x7fff989850d0) at
      ../../src/posix/QpiddBroker.cpp:130
      #9 0x00002aaefd0a482f in qpid::broker::Daemon::fork (this=0x7fff989850d0) at
      ../../src/qpid/broker/Daemon.cpp:91
      --Type <return> to continue, or q <return> to quit--
      #10 0x0000000000407585 in QpiddBroker::execute (this=<value optimized out>,
      options=<value optimized out>) at ../../src/posix/QpiddBroker.cpp:168
      #11 0x00000000004057ff in main (argc=12, argv=0x7fff989856b8) at
      ../../src/qpidd.cpp:80
      (gdb)

        Activity

        Hide
        Gordon Sim added a comment -

        Start lots of concurrent clients with heartbeats enabled running against a broker (I think I used a clustered broker) and then periodically kill clients and broker nodes. It can tae quite a while to run, and I haven't revisited this on trunk since I raised the bug but I believe it will still be there.

        I believe the following patch fixes it:

        — a/qpid/cpp/src/qpid/sys/DispatchHandle.cpp

        +++ b/qpid/cpp/src/qpid/sys/DispatchHandle.cpp

        @@ -302,12 +302,22 @@ void DispatchHandle::processEvent(Poller::EventType type) {

        // (because we use a copy from before the previous callbacks we won't

        // do anything yet that was just added)

        while (callbacks.size() > 0) {

        + {

        + ScopedLock<Mutex> lock(stateLock);

        + switch (state)

        { + case DELETING: + goto finishcallbacks; + default: + break; + }

        + }

        Callback cb = callbacks.front();

        assert(cb);

        cb(*this);

        callbacks.pop();

        }

        +finishcallbacks:

        {

        ScopedLock<Mutex> lock(stateLock);

        switch (state) {

        Show
        Gordon Sim added a comment - Start lots of concurrent clients with heartbeats enabled running against a broker (I think I used a clustered broker) and then periodically kill clients and broker nodes. It can tae quite a while to run, and I haven't revisited this on trunk since I raised the bug but I believe it will still be there. I believe the following patch fixes it: — a/qpid/cpp/src/qpid/sys/DispatchHandle.cpp +++ b/qpid/cpp/src/qpid/sys/DispatchHandle.cpp @@ -302,12 +302,22 @@ void DispatchHandle::processEvent(Poller::EventType type) { // (because we use a copy from before the previous callbacks we won't // do anything yet that was just added) while (callbacks.size() > 0) { + { + ScopedLock<Mutex> lock(stateLock); + switch (state) { + case DELETING: + goto finishcallbacks; + default: + break; + } + } Callback cb = callbacks.front(); assert(cb); cb(*this); callbacks.pop(); } +finishcallbacks: { ScopedLock<Mutex> lock(stateLock); switch (state) {
        Hide
        Andrew Stitcher added a comment -

        Gordon, do you have a replicator for this bug? And does it still exhibit?

        Show
        Andrew Stitcher added a comment - Gordon, do you have a replicator for this bug? And does it still exhibit?

          People

          • Assignee:
            Andrew Stitcher
            Reporter:
            Gordon Sim
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development