Qpid
  1. Qpid
  2. QPID-3893

C++ broker appears to segfault during MultipleTransactedBatchProducerTest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.15, 0.16, 0.17
    • Fix Version/s: 0.16, 0.17
    • Component/s: C++ Broker
    • Labels:
      None

      Description

      The C++ broker appears to segfault during MultipleTransactedBatchProducerTest. Several instances can be found in the history from our CI jobs:

      https://builds.apache.org/job/Qpid-Java-Cpp-Test/496/
      https://builds.apache.org/job/Qpid-Java-Cpp-Test/494/
      https://builds.apache.org/job/Qpid-Java-Cpp-Test/493/
      https://builds.apache.org/job/Qpid-Java-Cpp-Test/491/
      https://builds.apache.org/job/Qpid-Java-Cpp-Test/486/
      https://builds.apache.org/job/Qpid-Java-Cpp-Test/484/
      https://builds.apache.org/job/Qpid-Java-Cpp-Test/482/

      During course of the latest test failure in #496 (logs attached), the broker logged receiving yet another TxCommit command:

      BROKER: 2012-03-11 16:55:15 trace RECV [127.0.0.1:15672-127.0.0.1:34582]: Frame[BEbe; channel=0; {TxCommitBody: }] )
      

      After this nothing else is ever logged from the broker process, but exceptions start flying from the client indicating the connections were reset:

      IoReceiver - localhost/127.0.0.1:15672 2012-03-11 16:55:15,313 ERROR [apache.qpid.client.AMQConnectionDelegate_0_10] previous exception
      org.apache.qpid.transport.ConnectionException: Connection reset
      	at org.apache.qpid.transport.Connection.exception(Connection.java:512)
      	at org.apache.qpid.transport.network.Assembler.exception(Assembler.java:107)
      	at org.apache.qpid.transport.network.InputHandler.exception(InputHandler.java:199)
      	at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:169)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: java.net.SocketException: Connection reset
      	at java.net.SocketInputStream.read(SocketInputStream.java:168)
      	at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:147)
      	... 1 more
      
      IoSender - localhost/127.0.0.1:15672 2012-03-11 16:55:15,313 ERROR [transport.network.io.IoSender] error in write thread
      java.net.SocketException: Connection reset
      	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
      	at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
      	at org.apache.qpid.transport.network.io.IoSender.run(IoSender.java:313)
      	at java.lang.Thread.run(Thread.java:662)
      

      When the test tearDown occurs, it is logged that the broker process exited with an abnormal looking exit code, which a quick google suggests is the result of a segfault:

      main 2012-03-11 16:56:29,949 INFO [qpid.test.utils.QpidBrokerTestCase] stopping broker on port : 15672
      main 2012-03-11 16:56:29,949 INFO [qpid.test.utils.SpawnedBrokerHolder] Destroying broker process
      main 2012-03-11 16:56:29,949 INFO [qpid.test.utils.SpawnedBrokerHolder] broker exited: 139
      

      The test fails after 75sec, which is the exact time the test itself allows for the process undertaken to complete (artificially high to allow for extreme slowdowns on the CI nodes which happened occasionally in the past and also the C++ broker being slower at the test due to use of client side selectors; passing results show this usually takes 1-3sec on the Java broker and <10sec on the [transient] C++ broker), however the test harness tries to clean up the connections use in the test during tearDown and actually ends up reporting issues cleaning those up rather than the fact that the CountDownLatch used in the test wasnt decremented enough times.

        Activity

        Robbie Gemmell created issue -
        Robbie Gemmell made changes -
        Field Original Value New Value
        Description The C++ broker appears to segfault during MultipleTransactedBatchProducerTest. Several instances can be found in the history from our CI jobs:

        https://builds.apache.org/job/Qpid-Java-Cpp-Test/496/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/494/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/493/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/491/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/486/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/484/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/482/

        During course of the latest test failure in #496 (logs attached), the broker logged receiving yet another TxCommit command:

        {noformat}
        BROKER: 2012-03-11 16:55:15 trace RECV [127.0.0.1:15672-127.0.0.1:34582]: Frame[BEbe; channel=0; {TxCommitBody: }] )
        {noformat}

        After this nothing else is ever logged from the broker process, but exceptions start flying from the client indicating the connections were reset:

        {noformat}
        IoReceiver - localhost/127.0.0.1:15672 2012-03-11 16:55:15,313 ERROR [apache.qpid.client.AMQConnectionDelegate_0_10] previous exception
        org.apache.qpid.transport.ConnectionException: Connection reset
        at org.apache.qpid.transport.Connection.exception(Connection.java:512)
        at org.apache.qpid.transport.network.Assembler.exception(Assembler.java:107)
        at org.apache.qpid.transport.network.InputHandler.exception(InputHandler.java:199)
        at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:169)
        at java.lang.Thread.run(Thread.java:662)
        Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:147)
        ... 1 more
        {noformat}

        {noformat}
        IoSender - localhost/127.0.0.1:15672 2012-03-11 16:55:15,313 ERROR [transport.network.io.IoSender] error in write thread
        java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at org.apache.qpid.transport.network.io.IoSender.run(IoSender.java:313)
        at java.lang.Thread.run(Thread.java:662)
        {noformat}

        When the test tearDown occurs, it is logged that the broker process exited with an abnormal looking exit code, which a quick google suggests is the result of a segfault:
        {noformat}
        main 2012-03-11 16:56:29,949 INFO [qpid.test.utils.QpidBrokerTestCase] stopping broker on port : 15672
        main 2012-03-11 16:56:29,949 INFO [qpid.test.utils.SpawnedBrokerHolder] Destroying broker process
        main 2012-03-11 16:56:29,949 INFO [qpid.test.utils.SpawnedBrokerHolder] broker exited: 139
        {noformat}

        The test fails after 75sec, which is the exact time the test itself allows for the process undertaken to complete (artificially high to allow for extreme slowdowns on the CI nodes which happened occasionally in the past and also the C++ broker being slower at the test due to use of client side selectors; passing results show this usually takes 1-3sec on the Java broker and <10sec on the [transient] C++ broker), however the test hardness tries to clean up the connections use in the test during tearDown and actually ends up reporting issues cleaning those up rather than the fact that the CountDownLatch used in the test wasnt decremented enough times.
        The C++ broker appears to segfault during MultipleTransactedBatchProducerTest. Several instances can be found in the history from our CI jobs:

        https://builds.apache.org/job/Qpid-Java-Cpp-Test/496/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/494/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/493/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/491/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/486/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/484/
        https://builds.apache.org/job/Qpid-Java-Cpp-Test/482/

        During course of the latest test failure in #496 (logs attached), the broker logged receiving yet another TxCommit command:

        {noformat}
        BROKER: 2012-03-11 16:55:15 trace RECV [127.0.0.1:15672-127.0.0.1:34582]: Frame[BEbe; channel=0; {TxCommitBody: }] )
        {noformat}

        After this nothing else is ever logged from the broker process, but exceptions start flying from the client indicating the connections were reset:

        {noformat}
        IoReceiver - localhost/127.0.0.1:15672 2012-03-11 16:55:15,313 ERROR [apache.qpid.client.AMQConnectionDelegate_0_10] previous exception
        org.apache.qpid.transport.ConnectionException: Connection reset
        at org.apache.qpid.transport.Connection.exception(Connection.java:512)
        at org.apache.qpid.transport.network.Assembler.exception(Assembler.java:107)
        at org.apache.qpid.transport.network.InputHandler.exception(InputHandler.java:199)
        at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:169)
        at java.lang.Thread.run(Thread.java:662)
        Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:147)
        ... 1 more
        {noformat}

        {noformat}
        IoSender - localhost/127.0.0.1:15672 2012-03-11 16:55:15,313 ERROR [transport.network.io.IoSender] error in write thread
        java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at org.apache.qpid.transport.network.io.IoSender.run(IoSender.java:313)
        at java.lang.Thread.run(Thread.java:662)
        {noformat}

        When the test tearDown occurs, it is logged that the broker process exited with an abnormal looking exit code, which a quick google suggests is the result of a segfault:
        {noformat}
        main 2012-03-11 16:56:29,949 INFO [qpid.test.utils.QpidBrokerTestCase] stopping broker on port : 15672
        main 2012-03-11 16:56:29,949 INFO [qpid.test.utils.SpawnedBrokerHolder] Destroying broker process
        main 2012-03-11 16:56:29,949 INFO [qpid.test.utils.SpawnedBrokerHolder] broker exited: 139
        {noformat}

        The test fails after 75sec, which is the exact time the test itself allows for the process undertaken to complete (artificially high to allow for extreme slowdowns on the CI nodes which happened occasionally in the past and also the C++ broker being slower at the test due to use of client side selectors; passing results show this usually takes 1-3sec on the Java broker and <10sec on the [transient] C++ broker), however the test harness tries to clean up the connections use in the test during tearDown and actually ends up reporting issues cleaning those up rather than the fact that the CountDownLatch used in the test wasnt decremented enough times.
        Robbie Gemmell made changes -
        Attachment testlog.zip [ 12517907 ]
        Robbie Gemmell made changes -
        Affects Version/s 0.16 [ 12319870 ]
        Affects Version/s 0.17 [ 12320179 ]
        Andrew Stitcher made changes -
        Assignee Andrew Stitcher [ astitcher ]
        Andrew Stitcher made changes -
        Fix Version/s 0.17 [ 12320179 ]
        Andrew Stitcher made changes -
        Andrew Stitcher made changes -
        Andrew Stitcher made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.16 [ 12319870 ]
        Resolution Fixed [ 1 ]
        Justin Ross made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Andrew Stitcher
            Reporter:
            Robbie Gemmell
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development