Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-7051

Crash after reconnect with transactional session (with patch)

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • qpid-cpp-0.34, qpid-cpp-1.36.0
    • qpid-cpp-1.38.0
    • C++ Client
    • Red Hat Enterprise Linux Server release 6.7 (Santiago)

      The broker is ActiveMQ 5.13.0.
      The protocol used in AMQP 1.0.

    Description

      I have a test program (see the "consumer.cc" attachment) that creates a connection with "reconnect" enabled.
      It then creates a transactional session and a receiver to some queue from that session.
      It then reads all messages from the queue and prints out their content.
      A sleep is used between each read to make the test possible.

      While the broker is down the program will try to reconnect to it.
      As soon as it succeeds with that the fetch call throws an exception because the transaction has become invalid.
      The exception is caught and the read loop is broken out of.
      The test function then exits, causing the Receiver, Session, and Connection objects to be destructed.

      The crash happens while destructing the Connection object.

      It took some digging, but I managed to find the reason for the crash.
      When the Connection object is destructed it automatically destructs its ConnectionHandle object, which in turn destructs its ConnectionContext object. Nothing strange here.
      The ConnectionContext destructor makes a call to its own close method, which tries to shut down all its sessions.

      The problem is that the session has been made invalid by the disconnect, which causes the call to syncLH to throw an exception,
      which is not caught anywhere, indirectly causing the ConnectionContext destructor to throw an exception. This is a big no-no in C++.

      A side effect of this is that the transport object is not closed before it is destructed,
      which means that it is still listening for events. The crash happens when the next pending event tries to use
      the destructed transport object.

      The solution, in my humble opinion, is to catch the exception throws by the syncLH call in the ConnectionContext::close method.
      This way we can try to close all sessions even if one or more of them are invalidated for some reason.
      The rest of the cleanup process will also be done properly.

      How to run the test program:

      • Compile both "producer.cc" and "consumer.cc". They both need to be linked to the "qpidmessaging" library.
      • Run "producer" once. This will add ten messages to the "apa.bepa" queue on the broker.
      • Start "consumer".
      • When the consumer starts to print out the messages, shut down and restart the broker.

      Attachments

        1. consumer.cc
          2 kB
          Håkan Johansson
        2. producer.cc
          1 kB
          Håkan Johansson
        3. qpid-7051.patch
          0.7 kB
          Håkan Johansson
        4. qpid-cpp.patch
          0.9 kB
          Håkan Johansson

        Activity

          hakanj Håkan Johansson added a comment - - edited

          A similar fix is available in 1.36, but I don't know which jira was responsible.

          hakanj Håkan Johansson added a comment - - edited A similar fix is available in 1.36, but I don't know which jira was responsible.

          The 1.36 fix is not enough. The ConnectionContext::close function only catches MessageRejected when calling syncLH, but it also need to catch TransactionAborted as well.

          The patch that I attached to this jira works because it catches all std::exception exceptions.

          hakanj Håkan Johansson added a comment - The 1.36 fix is not enough. The ConnectionContext::close function only catches MessageRejected when calling syncLH , but it also need to catch TransactionAborted as well. The patch that I attached to this jira works because it catches all std::exception exceptions.
          hakanj Håkan Johansson added a comment - - edited

          Added patch for 1.36.0 (qpid-7051.patch). This one works better than the previous one.

          hakanj Håkan Johansson added a comment - - edited Added patch for 1.36.0 ( qpid-7051.patch ). This one works better than the previous one.
          jross Justin Ross added a comment - https://github.com/apache/qpid-cpp/commit/2fda1139ae93018d4a3c2bd5571c02512ad7ebe8
          jross Justin Ross added a comment -
          jross Justin Ross added a comment - Released in Qpid C++ 1.38.0, http://qpid.apache.org/releases/qpid-cpp-1.38.0

          People

            jross Justin Ross
            hakanj Håkan Johansson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: