Qpid
  1. Qpid
  2. QPID-2900

CommitRollbackTest#testGetThenRollback and CommitRollbackTest#testSend2ThenRollback fail on java.0.10 profile

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9
    • Component/s: Java Broker, Java Client
    • Labels:
      None

      Description

      org.apache.qpid.test.unit.transacted.CommitRollbackTest#testGetThenRollback
      org.apache.qpid.test.unit.transacted.CommitRollbackTest#testSend2ThenRollback

      These two tests fail very regularly (but not always) on the java.0.10 test profile:

      TestName: testGetThenRollback Duration: 6.882
      test message was consumed and rolled back, but is gone

      junit.framework.AssertionFailedError: test message was consumed and rolled back, but is gone
      at org.apache.qpid.test.unit.transacted.CommitRollbackTest.testGetThenRollback(CommitRollbackTest.java:273)
      at org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:232)
      at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:120)

      TestName: testSend2ThenRollback Duration: 6.879
      Didn't receive redelivered message one

      junit.framework.AssertionFailedError: Didn't receive redelivered message one
      at org.apache.qpid.test.unit.transacted.CommitRollbackTest.verifyMessages(CommitRollbackTest.java:379)
      at org.apache.qpid.test.unit.transacted.CommitRollbackTest.verifyMessages(CommitRollbackTest.java:415)
      at org.apache.qpid.test.unit.transacted.CommitRollbackTest.testSend2ThenRollback(CommitRollbackTest.java:357)
      at org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:232)
      at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:120)

        Activity

        Robbie Gemmell created issue -
        Robbie Gemmell made changes -
        Field Original Value New Value
        Assignee Robbie Gemmell [ gemmellr ]
        Robbie Gemmell made changes -
        Priority Major [ 3 ] Critical [ 2 ]
        Hide
        Robbie Gemmell added a comment -

        Deferring to 0.9

        Show
        Robbie Gemmell added a comment - Deferring to 0.9
        Robbie Gemmell made changes -
        Fix Version/s 0.9 [ 12315382 ]
        Fix Version/s 0.7 [ 12314455 ]
        Keith Wall made changes -
        Assignee Robbie Gemmell [ gemmellr ] Keith Wall [ k-wall ]
        Hide
        Keith Wall added a comment -

        I've traced the problem down to a race condition - broker side - within SimpleAMQQueue.

        The nub of the problem is the update of the atomic QueueContext._releasedEntry. There is a circumstance where _releasedEntry is overwritten prematurely and therefore the broker fails to resend the message to the client.

        Here's how it happens:

        In the cases where the tests fail, the thread (SubFlushRunner or QueueRunner) executing SimpleAMQQueue.deliverMessage() is yielded part way through the method. The thread has successfully sent the message to the client (MESSAGE_TRANSFER etc), but the JVM yields the thread before it reaches setLastSeen(). (SetLastSeen is responsible for advancing the lastSeen pointer, and resetting _releasedEntry if necessary).

        Meanwhile, the client receives the message, calls JMS rollback, which causes MESSASE_RELEASE (etc) command to be sent to the broker, before it calls JMS receive(). The broker processes the MESSAGE_RELEASE by invoking SimpleAMQ.requeue() which in turn invokes updateSubRequeueEntry() to update _releaseEntry with the QueueEntry to be resent.

        Now, the JVM wakes-up the thread (SubFlushRunner or QueueRunner) that was part way through processing SimpleAMQQueue .deliverMessage() for the initial send of the message. The setLastSeen() method is invoked, which **incorrectly** resets _releasedEntry to null because _releaseEntry now is the same as the entry being processed. The broker goes around the loop, calls getNextAvailableEntry which returns null because _releaseEntry has been prematurely cleared.

        I've illustrated the success and failure cases as sequence diagrams (attached to this JIRA).

        Show
        Keith Wall added a comment - I've traced the problem down to a race condition - broker side - within SimpleAMQQueue. The nub of the problem is the update of the atomic QueueContext._releasedEntry. There is a circumstance where _releasedEntry is overwritten prematurely and therefore the broker fails to resend the message to the client. Here's how it happens: In the cases where the tests fail, the thread (SubFlushRunner or QueueRunner) executing SimpleAMQQueue.deliverMessage() is yielded part way through the method. The thread has successfully sent the message to the client (MESSAGE_TRANSFER etc), but the JVM yields the thread before it reaches setLastSeen(). (SetLastSeen is responsible for advancing the lastSeen pointer, and resetting _releasedEntry if necessary). Meanwhile, the client receives the message, calls JMS rollback, which causes MESSASE_RELEASE (etc) command to be sent to the broker, before it calls JMS receive(). The broker processes the MESSAGE_RELEASE by invoking SimpleAMQ.requeue() which in turn invokes updateSubRequeueEntry() to update _releaseEntry with the QueueEntry to be resent. Now, the JVM wakes-up the thread (SubFlushRunner or QueueRunner) that was part way through processing SimpleAMQQueue .deliverMessage() for the initial send of the message. The setLastSeen() method is invoked, which ** incorrectly ** resets _releasedEntry to null because _releaseEntry now is the same as the entry being processed. The broker goes around the loop, calls getNextAvailableEntry which returns null because _releaseEntry has been prematurely cleared. I've illustrated the success and failure cases as sequence diagrams (attached to this JIRA).
        Hide
        Keith Wall added a comment -

        Two sequence diagrams to illustrate the issue.

        Show
        Keith Wall added a comment - Two sequence diagrams to illustrate the issue.
        Keith Wall made changes -
        Attachment qpid2900_sequence_diagram_success.png [ 12470437 ]
        Attachment qpid2900_sequence_diagram_failed.png [ 12470438 ]
        Hide
        Keith Wall added a comment -

        Patch to address this JIRA.

        SimpleAMQQueue changed to eliminate race condition.

        Bolstered unit tests around the area of the change.

        Changed TestableMemoryMessageStore to avoid a NPE. Previously this was causing a NPE to be logged during tearDown of some broker unit tests (including SimpleAMQQueueTest) but not causing test failures.

        Show
        Keith Wall added a comment - Patch to address this JIRA. SimpleAMQQueue changed to eliminate race condition. Bolstered unit tests around the area of the change. Changed TestableMemoryMessageStore to avoid a NPE. Previously this was causing a NPE to be logged during tearDown of some broker unit tests (including SimpleAMQQueueTest) but not causing test failures.
        Keith Wall made changes -
        Hide
        Keith Wall added a comment -

        Hi Robbie - can you review this patch please?

        Show
        Keith Wall added a comment - Hi Robbie - can you review this patch please?
        Keith Wall made changes -
        Assignee Keith Wall [ k-wall ] Robbie Gemmell [ gemmellr ]
        Robbie Gemmell made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Robbie Gemmell made changes -
        Status In Progress [ 3 ] Ready To Review [ 10006 ]
        Hide
        Robbie Gemmell added a comment -

        Patch applied.

        Show
        Robbie Gemmell added a comment - Patch applied.
        Robbie Gemmell made changes -
        Status Ready To Review [ 10006 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Rob Godfrey made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open In Progress In Progress
        118d 19h 55m 1 Robbie Gemmell 08/Feb/11 10:25
        In Progress In Progress Reviewable Reviewable
        7s 1 Robbie Gemmell 08/Feb/11 10:25
        Reviewable Reviewable Resolved Resolved
        14s 1 Robbie Gemmell 08/Feb/11 10:25
        Resolved Resolved Closed Closed
        1464d 9h 40m 1 Rob Godfrey 11/Feb/15 20:06

          People

          • Assignee:
            Robbie Gemmell
            Reporter:
            Robbie Gemmell
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development