Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
qpid-cpp-0.34, qpid-cpp-1.36.0
-
Red Hat Enterprise Linux Server release 6.7 (Santiago)
The broker is ActiveMQ 5.13.0.
The protocol used in AMQP 1.0.
Description
I have a test program (see the "consumer.cc" attachment) that creates a connection with "reconnect" enabled.
It then creates a transactional session and a receiver to some queue from that session.
It then reads all messages from the queue and prints out their content.
A sleep is used between each read to make the test possible.
While the broker is down the program will try to reconnect to it.
As soon as it succeeds with that the fetch call throws an exception because the transaction has become invalid.
The exception is caught and the read loop is broken out of.
The test function then exits, causing the Receiver, Session, and Connection objects to be destructed.
The crash happens while destructing the Connection object.
It took some digging, but I managed to find the reason for the crash.
When the Connection object is destructed it automatically destructs its ConnectionHandle object, which in turn destructs its ConnectionContext object. Nothing strange here.
The ConnectionContext destructor makes a call to its own close method, which tries to shut down all its sessions.
The problem is that the session has been made invalid by the disconnect, which causes the call to syncLH to throw an exception,
which is not caught anywhere, indirectly causing the ConnectionContext destructor to throw an exception. This is a big no-no in C++.
A side effect of this is that the transport object is not closed before it is destructed,
which means that it is still listening for events. The crash happens when the next pending event tries to use
the destructed transport object.
The solution, in my humble opinion, is to catch the exception throws by the syncLH call in the ConnectionContext::close method.
This way we can try to close all sessions even if one or more of them are invalidated for some reason.
The rest of the cleanup process will also be done properly.
How to run the test program:
- Compile both "producer.cc" and "consumer.cc". They both need to be linked to the "qpidmessaging" library.
- Run "producer" once. This will add ten messages to the "apa.bepa" queue on the broker.
- Start "consumer".
- When the consumer starts to print out the messages, shut down and restart the broker.
A similar fix is available in 1.36, but I don't know which jira was responsible.