Qpid
  1. Qpid
  2. QPID-4171

persistent messages from a non-transactional session can be enqueued out of order

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.16
    • Fix Version/s: 0.18
    • Component/s: Java Broker
    • Labels:
      None

      Description

      A defect in AsyncAutoCommitTransaction means that occasionally messages delivered by a non-transactional JMS are seen to be enqueued out-of-order and therefore delivered to a consumer out of the expected order.

      It is this problem that underlies QPID-4014, although as I think this problem is wider than ConflationQueues, I am raising this separately.

        Issue Links

          Activity

          Hide
          Keith Wall added a comment - - edited

          Background:

          AsyncAutoCommitTransaction#addFuture() is used by #enqueue() to arrange for the postTransactionAction to be executed as each store transaction is completed asynchronously by the (BDB store's) commit-thread.

          An optimisation in #addFuture means that an enqueue can 'jump-the-queue', and be enqueued before others that are already waiting in the future recorder list.

          A problematic sequence would be:

          1) Message 1 and Message 2 arrive from the wire (AMQProtocolEngine#received)
          2) Message 1 begins processing. AsyncAutoCommitTransaction.enqueue called, but as commit thread is running behind, its enqueue action joins list of the future recorder (AMQChannel#_unfinishedCommandsQueue).
          3) BDB commit thread catches up the backlog
          4) Message 2 begins processing. AsyncAutoCommitTransaction.enqueue called but as the StoreFuture is already completed, its enqueue action is executed immediately so Message 2 is enqueued.
          5) AMQProtocolEngine#complete calls receiveComplete() completing the actions in _unfinishedCommandsQueue. Message 1 is therefore enqueued.
          6) Message 1 and Message 2 are now out of order. Message 2 will be delivered to a Consumer before Message 1.

              private void addFuture(final StoreFuture future, final Action action)
              {
                  if(action != null)
                  {
                      if(future.isComplete())
                      {
                          action.postCommit();
                      }
                      else
                      {
                          _futureRecorder.recordFuture(future, action);
                      }
                  }
              }
          
          Show
          Keith Wall added a comment - - edited Background: AsyncAutoCommitTransaction#addFuture() is used by #enqueue() to arrange for the postTransactionAction to be executed as each store transaction is completed asynchronously by the (BDB store's) commit-thread. An optimisation in #addFuture means that an enqueue can 'jump-the-queue', and be enqueued before others that are already waiting in the future recorder list. A problematic sequence would be: 1) Message 1 and Message 2 arrive from the wire (AMQProtocolEngine#received) 2) Message 1 begins processing. AsyncAutoCommitTransaction.enqueue called, but as commit thread is running behind, its enqueue action joins list of the future recorder (AMQChannel#_unfinishedCommandsQueue). 3) BDB commit thread catches up the backlog 4) Message 2 begins processing. AsyncAutoCommitTransaction.enqueue called but as the StoreFuture is already completed, its enqueue action is executed immediately so Message 2 is enqueued. 5) AMQProtocolEngine#complete calls receiveComplete() completing the actions in _unfinishedCommandsQueue. Message 1 is therefore enqueued. 6) Message 1 and Message 2 are now out of order. Message 2 will be delivered to a Consumer before Message 1. private void addFuture( final StoreFuture future , final Action action) { if (action != null ) { if ( future .isComplete()) { action.postCommit(); } else { _futureRecorder.recordFuture( future , action); } } }
          Hide
          Robbie Gemmell added a comment -

          Renamed JIRA to indicate only persistent messages can be enqueued out of order, transient messages will always have an immediate future assigned and will thus enqueue in-order. It is possible for transient messages to overtake persistent messages, but thats is allowable (though we should have given people the option to prevent it).

          Attached are some helpful scribbles for anyone working on this. It includes a trivial potential fix (&& !persistent), along with some dead code removal, and addition of a noddy system property to escape the previously mentioned 'transient can overtake persistent' behaviour when mixing delivery modes. It semi-duplicates the existing addFuture() method because thats used by more than just the enqueue methods for things that care nothing of whether its persistent message or whether strict ordering is desired, so it seemed like it would be a bit nasty just feeding default parameter values to a shared implementation. This isn't really a patch as such, it hasn't even been compiled properly nevermind tested, but I think it should work

          Show
          Robbie Gemmell added a comment - Renamed JIRA to indicate only persistent messages can be enqueued out of order, transient messages will always have an immediate future assigned and will thus enqueue in-order. It is possible for transient messages to overtake persistent messages, but thats is allowable (though we should have given people the option to prevent it). Attached are some helpful scribbles for anyone working on this. It includes a trivial potential fix (&& !persistent), along with some dead code removal, and addition of a noddy system property to escape the previously mentioned 'transient can overtake persistent' behaviour when mixing delivery modes. It semi-duplicates the existing addFuture() method because thats used by more than just the enqueue methods for things that care nothing of whether its persistent message or whether strict ordering is desired, so it seemed like it would be a bit nasty just feeding default parameter values to a shared implementation. This isn't really a patch as such, it hasn't even been compiled properly nevermind tested, but I think it should work
          Hide
          Robbie Gemmell added a comment -

          (I could probably be persuaded to reverse the system property so that people have the option to turn that behaviour on rather than off, since out blocking IO layer probably means it makes only a small difference to overall performance currently because we have to complete everything synchronously when the read is finished.)

          Show
          Robbie Gemmell added a comment - (I could probably be persuaded to reverse the system property so that people have the option to turn that behaviour on rather than off, since out blocking IO layer probably means it makes only a small difference to overall performance currently because we have to complete everything synchronously when the read is finished.)
          Hide
          Philip Harvey added a comment -

          attached patch

          Show
          Philip Harvey added a comment - attached patch
          Hide
          Philip Harvey added a comment -

          please commit the patch I attached today

          Show
          Philip Harvey added a comment - please commit the patch I attached today
          Hide
          Alex Rudyy added a comment -

          I reviewed the patch and the changes look fine for me.

          Show
          Alex Rudyy added a comment - I reviewed the patch and the changes look fine for me.
          Hide
          Keith Wall added a comment -

          Patch applied.

          Show
          Keith Wall added a comment - Patch applied.
          Hide
          Justin Ross added a comment -

          Reviewed by Keith. Approved for 0.18.

          Show
          Justin Ross added a comment - Reviewed by Keith. Approved for 0.18.
          Hide
          Robbie Gemmell added a comment -

          Change merged to the 0.18 branch.

          Show
          Robbie Gemmell added a comment - Change merged to the 0.18 branch.

            People

            • Assignee:
              Robbie Gemmell
              Reporter:
              Keith Wall
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development