Qpid
  1. Qpid
  2. QPID-3704

SortedQueueTest.testTransactedSortedQueue failing occasionally on 0-9-1 profile

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.15
    • Component/s: Java Broker, Java Tests
    • Labels:
      None

      Description

      Test appears to have begun failing after the performance improvement work.

      See https://builds.apache.org/view/M-R/view/Qpid/job/Qpid-Java-Java-TestMatrix/188 and https://builds.apache.org/view/M-R/view/Qpid/job/Qpid-Java-Java-TestMatrix/184

      Incorrect number of messages received expected:<200> but was:<199>
      Stacktrace
      
      junit.framework.AssertionFailedError: Incorrect number of messages received expected:<200> but was:<199>
      	at org.apache.qpid.server.queue.SortedQueueTest.runThroughSortedQueueForSessionMode(SortedQueueTest.java:153)
      	at org.apache.qpid.server.queue.SortedQueueTest.testTransactedSortedQueue(SortedQueueTest.java:115)
      	at org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:240)
      	at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:135)
      

        Activity

        Hide
        Keith Wall added a comment -

        Failure is also affecting testClientAckSortedQueue and testAutoAckSortedQueue too.

        On my machine, I see this test fail approximately 1 time in 5. By modifying the test to call AMQSession#getQueueDepth() it can be seen that the Broker still has queue depth of 1 at the end of the test. Also, by examining the logs, it can be seen that the missing message (always has property value " 20" on my machine) is never actually being sent to the client at all.

        I've been unable to reproduce this problem by running the same test on the 0.5x branch.

        Show
        Keith Wall added a comment - Failure is also affecting testClientAckSortedQueue and testAutoAckSortedQueue too. On my machine, I see this test fail approximately 1 time in 5. By modifying the test to call AMQSession#getQueueDepth() it can be seen that the Broker still has queue depth of 1 at the end of the test. Also, by examining the logs, it can be seen that the missing message (always has property value " 20" on my machine) is never actually being sent to the client at all. I've been unable to reproduce this problem by running the same test on the 0.5x branch.
        Hide
        Keith Wall added a comment - - edited

        Problem is a race condition between an incoming enqueue and an outbound delivery. It is trigger by a particular data pattern. I think the problem lies in OutOfOrderQueue#checkSubscriptionsNotAheadOfDelivery().

        Imagine a set of messages with keys

        {3, 5, 4}

        . Message 3 has been sent to a consumer, message 5 is in the process of being delivered, and message 4 is being enqueued.

        When the unlucky timing strikes, the thread processing the enqueue observes _lastSeenEntry at 3, so does not update _releasedEntry. A moment later the delivery thread sending Message 5 to the consumer calls #setLastSeenEntry setting _lastSeenEntry to 5. This means message 4 becomes stuck and will remain undelivered to the consumer until a message with key less than 4 arrives.

        I think the _lastSeenEntry optimisation is unsafe and incoming deliveries should always _releasedEntry backwards if necessary. I attach a patch. Any thoughts?

        Show
        Keith Wall added a comment - - edited Problem is a race condition between an incoming enqueue and an outbound delivery. It is trigger by a particular data pattern. I think the problem lies in OutOfOrderQueue#checkSubscriptionsNotAheadOfDelivery(). Imagine a set of messages with keys {3, 5, 4} . Message 3 has been sent to a consumer, message 5 is in the process of being delivered, and message 4 is being enqueued. When the unlucky timing strikes, the thread processing the enqueue observes _lastSeenEntry at 3, so does not update _releasedEntry. A moment later the delivery thread sending Message 5 to the consumer calls #setLastSeenEntry setting _lastSeenEntry to 5. This means message 4 becomes stuck and will remain undelivered to the consumer until a message with key less than 4 arrives. I think the _lastSeenEntry optimisation is unsafe and incoming deliveries should always _releasedEntry backwards if necessary. I attach a patch. Any thoughts?
        Hide
        Rob Godfrey added a comment -

        Agree with your analysis, and the patch looks good to me

        Show
        Rob Godfrey added a comment - Agree with your analysis, and the patch looks good to me
        Hide
        Keith Wall added a comment -

        Patch applied.

        Show
        Keith Wall added a comment - Patch applied.
        Hide
        Robbie Gemmell added a comment -

        Marking reviewed, Rob already looked at it and it looks good to me too.

        Show
        Robbie Gemmell added a comment - Marking reviewed, Rob already looked at it and it looks good to me too.

          People

          • Assignee:
            Keith Wall
            Reporter:
            Keith Wall
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development