Qpid
  1. Qpid
  2. QPID-4014

System test ConflationQueueTest.testParallelProductionAndConsumption failed

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.17
    • Component/s: Java Tests
    • Labels:
      None

      Description

      Unit test ConflationQueueTest.testParallelProductionAndConsumption has failed on Jenkins at least twice:

      This test has previously run successfully.

      On first glance, the test doesn't appear to have any inherent race conditions, therefore it is possible that this represents a genuine problem.

      In this test, two producers each each 400 messages with alternating conflation key values, followed by special "shutdown" messages. A single consumer remembers the latest message for each conflation key value.

      Each producer's last few messages should be:

      • Message 397 (conflation key '1')
      • Message 398 (conflation key '0')
      • Message 399 (conflation key '1')
      • Shutdown message

      The assertion in question checks the last message received for each conflation key value. In this case, the last message with key '1' is message 397, not the expected message 399.

      Examination of the test output (attached) indicates that the consumer received message 399 once (it must have been conflated), and that - surprisingly - it arrived after the two shutdown messages.

        Issue Links

          Activity

          Philip Harvey created issue -
          Hide
          Philip Harvey added a comment - - edited

          attached the output of the failing test (build 206)

          Show
          Philip Harvey added a comment - - edited attached the output of the failing test (build 206)
          Philip Harvey made changes -
          Hide
          Philip Harvey added a comment - - edited

          Here is a summary of the events in the attached log file

          line  event
          --------------------------------------------------------
          10474 sender2 sends 399-2 (timestamp 1337682147990, id dc2aaab4-9cd7-3d56-8e27-a1687458d2ed, message store id 777)
          10504 sender2 sends SHUTDOWN-2 (timestamp 1337682147990, id be63fcc4-6266-3189-b72b-ddff7bebb071)
          
          10906 sender1 sends 399-1 (timestamp 1337682147999, id 86d7a88c-160c-3e09-b297-43ee7e3aca3b, message store id 801)
          10927 sender1 sends SHUTDOWN-1 (same timstamp id cd4ff12b-5591-3e3d-ae67-4ff0a27bfe78, message store id 802)
          
          
          
          38015 broker receives 399-2
          
          38040 broker receives SHUTDOWN-2
          
          38249 broker sends SHUTDOWN-2
          38269 client receives SHUTDOWN-2
          
          39349 broker receives 399-1
          
          39398 broker receives SHUTDOWN-1
          39529 broker sends SHUTDOWN-1 from thread "pool-3-thread-1141" *oops we haven't sent 399-1 yet*
          
          39541 client receives SHUTDOWN-1
          
          39582 broker sends 399-1 from thread "pool-3-thread-1139"
          
          tearDown
          
          39601 client receives 399-1 *oops* 
          
          Show
          Philip Harvey added a comment - - edited Here is a summary of the events in the attached log file line event -------------------------------------------------------- 10474 sender2 sends 399-2 (timestamp 1337682147990, id dc2aaab4-9cd7-3d56-8e27-a1687458d2ed, message store id 777) 10504 sender2 sends SHUTDOWN-2 (timestamp 1337682147990, id be63fcc4-6266-3189-b72b-ddff7bebb071) 10906 sender1 sends 399-1 (timestamp 1337682147999, id 86d7a88c-160c-3e09-b297-43ee7e3aca3b, message store id 801) 10927 sender1 sends SHUTDOWN-1 (same timstamp id cd4ff12b-5591-3e3d-ae67-4ff0a27bfe78, message store id 802) 38015 broker receives 399-2 38040 broker receives SHUTDOWN-2 38249 broker sends SHUTDOWN-2 38269 client receives SHUTDOWN-2 39349 broker receives 399-1 39398 broker receives SHUTDOWN-1 39529 broker sends SHUTDOWN-1 from thread "pool-3-thread-1141" *oops we haven't sent 399-1 yet* 39541 client receives SHUTDOWN-1 39582 broker sends 399-1 from thread "pool-3-thread-1139" tearDown 39601 client receives 399-1 *oops*
          Hide
          Philip Harvey added a comment - - edited

          I've just seen this problem occur again. I repeatedly ran the test on my PC and eventually saw it fail on the 406th attempt.

          I've attached the test output as TEST-org.apache.qpid.server.queue.ConflationQueueTest.testParallelProductionAndConsumption.out.2nd-occurence.gz.

          To filter the test output to only show the relevant messages (i.e. message 399 and the SHUTDOWN messages), use this Linux command:

          egrep -n -a2 '(msg=399|SHUTDOWN)' TEST-org.apache.qpid.server.queue.ConflationQueueTest.testParallelProductionAndConsumption.out
          
          Show
          Philip Harvey added a comment - - edited I've just seen this problem occur again. I repeatedly ran the test on my PC and eventually saw it fail on the 406th attempt. I've attached the test output as TEST-org.apache.qpid.server.queue.ConflationQueueTest.testParallelProductionAndConsumption.out.2nd-occurence.gz. To filter the test output to only show the relevant messages (i.e. message 399 and the SHUTDOWN messages), use this Linux command: egrep -n -a2 '(msg=399|SHUTDOWN)' TEST-org.apache.qpid.server.queue.ConflationQueueTest.testParallelProductionAndConsumption.out
          Hide
          Philip Harvey added a comment -

          test output from the second occurrence of this test failure.

          Show
          Philip Harvey added a comment - test output from the second occurrence of this test failure.
          Philip Harvey made changes -
          Assignee Keith Wall [ k-wall ] Philip Harvey [ philharveyonline ]
          Philip Harvey made changes -
          Description Unit test ConflationQueueTest.testParallelProductionAndConsumption has failed on Jenkins:

          https://builds.apache.org/view/M-R/view/Qpid/job/Qpid-Java-Java-BDB-TestMatrix/jdk=JDK%201.6%20%28latest%29,label=Ubuntu,profile=java-bdb.0-10/lastCompletedBuild/testReport/org.apache.qpid.server.queue/ConflationQueueTest/testParallelProductionAndConsumption/

          This test has previously run successfully.

          On first glance, the test doesn't appear to have any inherent race conditions, therefore it is possible that this represents a genuine problem.

          In this test, two producers each each 400 messages with alternating conflation key values, followed by special "shutdown" messages. A single consumer remembers the latest message for each conflation key value.

          Each producer's last few messages should be:
          * Message 397 (conflation key '1')
          * Message 398 (conflation key '0')
          * Message 399 (conflation key '1')
          * Shutdown message

          The assertion in question checks the last message received for each conflation key value. In this case, the last message with key '1' is message 397, not the expected message 399.

          Examination of the test output (attached) indicates that the consumer received message 399 once (it must have been conflated), and that - surprisingly - it arrived *after* the two shutdown messages.
          Unit test ConflationQueueTest.testParallelProductionAndConsumption has failed on Jenkins at least twice:

          * [Build 206|https://builds.apache.org/view/M-R/view/Qpid/job/Qpid-Java-Java-BDB-TestMatrix/jdk=JDK%201.6%20(latest),label=Ubuntu,profile=java-bdb.0-9-1/206/testReport/org.apache.qpid.server.queue/ConflationQueueTest/testParallelProductionAndConsumption/]
          * [Build 291|https://builds.apache.org/job/Qpid-Java-Java-BDB-TestMatrix/291/jdk=JDK%201.6%20(latest),label=Ubuntu,profile=java-bdb.0-9-1/testReport/junit/org.apache.qpid.server.queue/ConflationQueueTest/testParallelProductionAndConsumption/]

          This test has previously run successfully.

          On first glance, the test doesn't appear to have any inherent race conditions, therefore it is possible that this represents a genuine problem.

          In this test, two producers each each 400 messages with alternating conflation key values, followed by special "shutdown" messages. A single consumer remembers the latest message for each conflation key value.

          Each producer's last few messages should be:
          * Message 397 (conflation key '1')
          * Message 398 (conflation key '0')
          * Message 399 (conflation key '1')
          * Shutdown message

          The assertion in question checks the last message received for each conflation key value. In this case, the last message with key '1' is message 397, not the expected message 399.

          Examination of the test output (attached) indicates that the consumer received message 399 once (it must have been conflated), and that - surprisingly - it arrived *after* the two shutdown messages.
          Hide
          Philip Harvey added a comment -

          attached log file from test that failed in build 291

          Show
          Philip Harvey added a comment - attached log file from test that failed in build 291
          Hide
          Philip Harvey added a comment -

          attached proposed patch to increase debug logging in ConflationQueueList to aid diagnosis of the occasional test failure.

          Show
          Philip Harvey added a comment - attached proposed patch to increase debug logging in ConflationQueueList to aid diagnosis of the occasional test failure.
          Philip Harvey made changes -
          Hide
          Robbie Gemmell added a comment -

          Patch applied.

          Show
          Robbie Gemmell added a comment - Patch applied.
          Hide
          Keith Wall added a comment - - edited

          Another instance of this test failure from build 419. Hopefully Phil's extra logging will help us understand the underlying cause. Log file attached to JIRA.

          Error Message

          The last message sent for each key should match the last message received for that key expected:<

          {1=399, 0=398}> but was:<{1=397, 0=398}>
          Stacktrace
          
          

          junit.framework.AssertionFailedError: The last message sent for each key should match the last message received for that key expected:<{1=399, 0=398}

          > but was:<

          {1=397, 0=398}

          >
          at org.apache.qpid.server.queue.ConflationQueueTest.testParallelProductionAndConsumption(ConflationQueueTest.java:414)
          at org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:244)
          at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:137)

          
          
          Show
          Keith Wall added a comment - - edited Another instance of this test failure from build 419. Hopefully Phil's extra logging will help us understand the underlying cause. Log file attached to JIRA. Error Message The last message sent for each key should match the last message received for that key expected:< {1=399, 0=398}> but was:<{1=397, 0=398}> Stacktrace junit.framework.AssertionFailedError: The last message sent for each key should match the last message received for that key expected:<{1=399, 0=398} > but was:< {1=397, 0=398} > at org.apache.qpid.server.queue.ConflationQueueTest.testParallelProductionAndConsumption(ConflationQueueTest.java:414) at org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:244) at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:137)
          Keith Wall made changes -
          Link This issue relates QPID-4171 [ QPID-4171 ]
          Hide
          Keith Wall added a comment -

          I think I understand the underlying cause, see QPID-4171.

          Show
          Keith Wall added a comment - I think I understand the underlying cause, see QPID-4171 .
          Keith Wall made changes -
          Status Open [ 1 ] Closed [ 6 ]
          Resolution Fixed [ 1 ]
          Hide
          Keith Wall added a comment -

          Fixed by QPID-4171.

          Show
          Keith Wall added a comment - Fixed by QPID-4171 .
          Gavin made changes -
          Link This issue relates to QPID-4171 [ QPID-4171 ]
          Gavin made changes -
          Link This issue relates to QPID-4171 [ QPID-4171 ]

            People

            • Assignee:
              Philip Harvey
              Reporter:
              Philip Harvey
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development