Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-8056

qpid::messaging::ConnectionContext crash after network disconnect (with patch)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • qpid-cpp-1.36.0
    • None
    • C++ Client
    • RedHat Enterprise Linux 6

    • Patch

    Description

      When doing HA testing we found that our application often crashed inside the Qpid Messaging library.

      Our test:

      • One ActiveMQ broker.
      • Two proxies connecting to the AMQP port on the broker. At the start, only one of the proxies are running.
      • Test program configured to use failover between the two proxies. Protocol is "amqp1.0". It reads messages in a loop using a transactional session. On error it closes the connection and opens a new.
      • Three queues are read from in parallel, each reader using its own connection in a thread. Nothing is shared between the threads in the client code.
      • Send some messages and let the test program process them.
      • Stop proxy1 and start proxy2.
      • Send some more messages and let the test program process them.
      • Stop proxy2 and start proxy1.
      • And so on...

      After a couple of switches the test program crashes, but not always. It's a timing thing.
      A typical error message that we see before the crash:

      Exception when trying to close the qpid connection: Transaction outcome unknown: transport failure
      

      The reason for the crash is that the poller thread is still active when the connection is being deleted. The destructor of the qpid::messaging::ConnectionContext class deletes the TcpTransport instance at the same time as, or right before, the poller thread is calling a callback on it (qpid::messaging::amqp::TcpTransport::disconnected).

      I have attached a patch to solve the issue, at least for this use case.

      I cannot test this on 1.37.0 as I cannot build that version on RHEL6 as it uses Python 2.6 which is no longer supported in 1.37.0. The code in question is identical in 1.36.0 and 1.37.0 though.

      Attachments

        1. valgrind.txt
          9 kB
          Håkan Johansson
        2. connection_context.diff
          0.6 kB
          Håkan Johansson

        Activity

          People

            jross Justin Ross
            hakanj Håkan Johansson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: