The commit made in rev 1057460 uncovered a more deeper issue that violates the atomicity of a transaction that was disrupted by failover.
The symptom was one or two messages seems to get onto the queue outside of the transaction boundaries.
Upon closer inspection these were messages that were in the failed transaction. If the application re-tries the failed transaction it results in duplicates further complicating the issue.
The underlying root cause is as follows.
1. When a message-transfer reaches the invoke method in Session.java and if the session-state is detached at that time, the thread waits until the session is OPEN or CLOSED.
2. If failover completes within the wait period and the session is resumed, thereby being marked OPEN and the message transfer in progress just resumes and reaches the broker.
3. At this point the session is still not marked transactional (and there is no logic in place to ever issue a txSelect after failover as well) so the message is enqueued.
4. In the meantime the JMS session used by the application gets to know that failover happens and is marked dirty and an exception is received.
5. If the application chooses to resume the session (ignoring the exception) then subsequent message transfers will get to the queue on the broker but the session will get closed once it sends a commit (or a rollback) as the broker will complain that the session is not transactional.
6. If the application chooses to create a new session then it will start sending sub sequent messages within transaction boundaries and work as expected. But will still have that extra one or two messages that sneaked in when the old session was reopned. If the application retired the aborted transaction then it will result in duplicates due to the messages that sneaked in.
A reasonable solution to this issue is to,
1) Close a session marked transactional immediately when the session detaches. i.e a transactional session is never resumed and a new session should be created to continue.
2) We also need to document that clearly.
Also during investigation I found a race condition where an application could create a new session (recreating due to an exception or a completely new session in the midst of failover) before the connection is open.
This results in session attach being sent before the connection negotiation is completed. All though the connect method and the createSession method in Connection.java contends for the same lock, the connect method which acquires it early, will releases the lock when it waits (until the connection achieves OPEN state) and the createSession method waiting on the lock will get it and continue. This actually exposed a bug in the C++ broker. See
We need to ensure that createSession method is not executed until the connection achieve OPEN state. I will open a separate JIRA for this.
(*)Another race condition found is that if a session is created (after the connection is setup and is marked OPEN) but before the resume method (in Connection.java) is called, it results in the new session being reattached again. This could result in unnecessary duplication of messages.
We need to ensure that createSession method does not get executed until the resume method is completed. Again I will open a separate JIRA for this.