Qpid
  1. Qpid
  2. QPID-3911

Consumer.close() and session.rollback() deadlocks

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.15
    • Fix Version/s: 0.16
    • Component/s: Java Client
    • Labels:
    • Environment:

      0.16(beta/RC1) Java Client and Java Broker

      Description

      Found one Java-level deadlock:
      =============================
      "Dispatcher-Channel-0":
      waiting to lock monitor 0x0000000001e65ec8 (object
      0x00000007c180bd58, a java.lang.Object),
      which is held by "main"
      "main":
      waiting to lock monitor 0x0000000001cffbc8 (object
      0x00000007c2e10c08, a java.lang.Object),
      which is held by "Dispatcher-Channel-0"

      Java stack information for the threads listed above:
      ===================================================
      "Dispatcher-Channel-0":
      at
      org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1255)

      • waiting to lock <0x00000007c180bd58> (a java.lang.Object)
        at
        org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1057)
        at
        org.apache.qpid.client.AMQSession_0_10.sync(AMQSession_0_10.java:1034)
        at
        org.apache.qpid.client.AMQSession_0_10.sendSuspendChannel(AMQSession_0_10.java:851)
        at
        org.apache.qpid.client.AMQSession.suspendChannel(AMQSession.java:3075)
      • locked <0x00000007c2c3d330> (a java.lang.Object)
        at org.apache.qpid.client.AMQSession.rollback(AMQSession.java:1854)
      • locked <0x00000007c2c3d330> (a java.lang.Object)
        at
        QpidConsumerCloseRollbackDeadlock$QpidMqHandler.onMessage(QpidConsumerCloseRollbackDeadlock.java:208)
        at
        org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:745)
        at
        org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:141)
        at
        org.apache.qpid.client.BasicMessageConsumer.notifyMessage(BasicMessageConsumer.java:719)
        at
        org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:186)
        at
        org.apache.qpid.client.BasicMessageConsumer_0_10.notifyMessage(BasicMessageConsumer_0_10.java:54)
        at
        org.apache.qpid.client.AMQSession$Dispatcher.notifyConsumer(AMQSession.java:3467)
        at
        org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3406)
      • locked <0x00000007c2c3d350> (a java.lang.Object)
      • locked <0x00000007c2e10c08> (a java.lang.Object)
        at
        org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3180)
        at org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3173)
        at
        org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54)
        at
        org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3329)
        at java.lang.Thread.run(Thread.java:636)
        "main":
        at
        org.apache.qpid.client.AMQSession$Dispatcher.rejectPending(AMQSession.java:3211)
      • waiting to lock <0x00000007c2e10c08> (a java.lang.Object)
        at
        org.apache.qpid.client.AMQSession.confirmConsumerCancelled(AMQSession.java:903)
        at
        org.apache.qpid.client.BasicMessageConsumer_0_10.sendCancel(BasicMessageConsumer_0_10.java:170)
        at
        org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:593)
      • locked <0x00000007c180bd58> (a java.lang.Object)
        at
        org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:555)
        at
        QpidConsumerCloseRollbackDeadlock.main(QpidConsumerCloseRollbackDeadlock.java:77)

      Found 1 deadlock.

        Activity

        Hide
        Praveen Murugesan added a comment -

        Test to simulate the scenario

        Show
        Praveen Murugesan added a comment - Test to simulate the scenario
        Hide
        Praveen Murugesan added a comment -

        Deadlock stack traces.

        Show
        Praveen Murugesan added a comment - Deadlock stack traces.
        Hide
        Alex Rudyy added a comment -

        Invoking of MessageConsumer#close() and Session#rollback from a consumer listener results in a deadlock on 0-10 path and a timeout exception on 0-9 path

        On 0-10 path the deadlock is caused by an ExecutionException sent from the broker in response to message.stop, message.flow commands (sent from Session#rollback for suspension of the channel) when message.stop or message.flow command is delivered to broker after message.cancel command which is sent as part of MessageConsumer#close(). Command message.cancel results in a deletion of a subscription on a broker side and for the following message.stop or message.flow commands the subscription cannot be found and ExceutionException is reported:

        org.apache.qpid.AMQException: ch=0 id=2 ExecutionException(errorCode=NOT_FOUND, commandId=20, classCode=4, commandCode=12, fieldIndex=0, description=not-found: Unknown destination 1 session=guest@QPID.89a222e3-e3ca-4aee-8e4f-a78468f4fed1 (qpid/broker/SemanticState.cpp:569), errorInfo={}) [error code 404: not found]
        

        The ExecutionException is sent from both Java and C++ Brokers

        On receiving an ExecutionException connection tries to acquire the FailoverMutex in order to close itself in AMQConnection#exceptionReceived method.

        However, the FailoverMutex is acquired in MessageConsumer#close() which is waiting for a release of a session dispatcher lock. The last is hold in a dispatcher thread. The closing of a connection occurs in a dispatcher thread and this results in a deadlock.

        Show
        Alex Rudyy added a comment - Invoking of MessageConsumer#close() and Session#rollback from a consumer listener results in a deadlock on 0-10 path and a timeout exception on 0-9 path On 0-10 path the deadlock is caused by an ExecutionException sent from the broker in response to message.stop, message.flow commands (sent from Session#rollback for suspension of the channel) when message.stop or message.flow command is delivered to broker after message.cancel command which is sent as part of MessageConsumer#close(). Command message.cancel results in a deletion of a subscription on a broker side and for the following message.stop or message.flow commands the subscription cannot be found and ExceutionException is reported: org.apache.qpid.AMQException: ch=0 id=2 ExecutionException(errorCode=NOT_FOUND, commandId=20, classCode=4, commandCode=12, fieldIndex=0, description=not-found: Unknown destination 1 session=guest@QPID.89a222e3-e3ca-4aee-8e4f-a78468f4fed1 (qpid/broker/SemanticState.cpp:569), errorInfo={}) [error code 404: not found] The ExecutionException is sent from both Java and C++ Brokers On receiving an ExecutionException connection tries to acquire the FailoverMutex in order to close itself in AMQConnection#exceptionReceived method. However, the FailoverMutex is acquired in MessageConsumer#close() which is waiting for a release of a session dispatcher lock. The last is hold in a dispatcher thread. The closing of a connection occurs in a dispatcher thread and this results in a deadlock.
        Hide
        Alex Rudyy added a comment -

        I attached a patch fixing the issue

        Show
        Alex Rudyy added a comment - I attached a patch fixing the issue
        Hide
        Praveen Murugesan added a comment -

        Thanks a lot Alex for getting to this real quick

        Show
        Praveen Murugesan added a comment - Thanks a lot Alex for getting to this real quick
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4546/
        -----------------------------------------------------------

        Review request for qpid, Robbie Gemmell, rajith attapattu, Keith Wall, and Rob Godfrey.

        Summary
        -------

        Invoking of MessageConsumer#close() and Session#rollback from a consumer listener results in a deadlock on 0-10 path and a timeout exception on 0-9 path

        On 0-10 path the deadlock is caused by an ExecutionException sent from the broker in response to message.stop, message.flow commands (sent from Session#rollback for the suspension of the channel) when message.stop or message.flow command is delivered to the broker after message.cancel command which is sent as part of MessageConsumer#close(). Command message.cancel results in a deletion of a subscription on the broker side and for the following message.stop or message.flow command the subscription cannot be found and ExceutionException is reported:

        org.apache.qpid.AMQException: ch=0 id=2 ExecutionException(errorCode=NOT_FOUND, commandId=20, classCode=4, commandCode=12, fieldIndex=0, description=not-found: Unknown destination 1 session=guest@QPID.89a222e3-e3ca-4aee-8e4f-a78468f4fed1 (qpid/broker/SemanticState.cpp:569), errorInfo={}) [error code 404: not found]

        The ExecutionException is sent from both Java and C++ Brokers

        On receiving an ExecutionException connection tries to acquire the FailoverMutex in order to close itself in AMQConnection#exceptionReceived method.

        However, the FailoverMutex is acquired in MessageConsumer#close() which is waiting for a release of a session dispatcher lock. The last is hold in a dispatcher thread. The closing of a connection occurs in a dispatcher thread and this results in a deadlock.

        The suggested patch introduces the following changes:
        Adds synchronization on AMSession#_messageDeliveryLock into MessageConsumer#close() in order to block until message listener in progress has completed(as required in JMS javadoc for MessageConsumer#close())
        Changes the session dispatcher to stop the message delivery into consumer local message queue if the consumer in the process of closing. This eliminates the need to stop the dispatcher on rejecting pending messages for closing consumer.
        Removes the synchronization on a session dispatcher lock from AMQSession.Dispatcher#rejectPending and code to stop the dispatcher as we are synchronizing on a deliveryLock now and incoming messages are not dispatched into closing consumer anymore.
        Adds a system test to reproduce the deadlock

        This addresses bug QPID-3911.
        https://issues.apache.org/jira/browse/QPID-3911

        Diffs


        /trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1306567
        /trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1306567
        /trunk/qpid/java/systests/src/main/java/org/apache/qpid/test/unit/close/MessageConsumerCloseTest.java PRE-CREATION

        Diff: https://reviews.apache.org/r/4546/diff

        Testing
        -------

        Thanks,

        Oleksandr

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4546/ ----------------------------------------------------------- Review request for qpid, Robbie Gemmell, rajith attapattu, Keith Wall, and Rob Godfrey. Summary ------- Invoking of MessageConsumer#close() and Session#rollback from a consumer listener results in a deadlock on 0-10 path and a timeout exception on 0-9 path On 0-10 path the deadlock is caused by an ExecutionException sent from the broker in response to message.stop, message.flow commands (sent from Session#rollback for the suspension of the channel) when message.stop or message.flow command is delivered to the broker after message.cancel command which is sent as part of MessageConsumer#close(). Command message.cancel results in a deletion of a subscription on the broker side and for the following message.stop or message.flow command the subscription cannot be found and ExceutionException is reported: org.apache.qpid.AMQException: ch=0 id=2 ExecutionException(errorCode=NOT_FOUND, commandId=20, classCode=4, commandCode=12, fieldIndex=0, description=not-found: Unknown destination 1 session=guest@QPID.89a222e3-e3ca-4aee-8e4f-a78468f4fed1 (qpid/broker/SemanticState.cpp:569), errorInfo={}) [error code 404: not found] The ExecutionException is sent from both Java and C++ Brokers On receiving an ExecutionException connection tries to acquire the FailoverMutex in order to close itself in AMQConnection#exceptionReceived method. However, the FailoverMutex is acquired in MessageConsumer#close() which is waiting for a release of a session dispatcher lock. The last is hold in a dispatcher thread. The closing of a connection occurs in a dispatcher thread and this results in a deadlock. The suggested patch introduces the following changes: Adds synchronization on AMSession#_messageDeliveryLock into MessageConsumer#close() in order to block until message listener in progress has completed(as required in JMS javadoc for MessageConsumer#close()) Changes the session dispatcher to stop the message delivery into consumer local message queue if the consumer in the process of closing. This eliminates the need to stop the dispatcher on rejecting pending messages for closing consumer. Removes the synchronization on a session dispatcher lock from AMQSession.Dispatcher#rejectPending and code to stop the dispatcher as we are synchronizing on a deliveryLock now and incoming messages are not dispatched into closing consumer anymore. Adds a system test to reproduce the deadlock This addresses bug QPID-3911 . https://issues.apache.org/jira/browse/QPID-3911 Diffs /trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession.java 1306567 /trunk/qpid/java/client/src/main/java/org/apache/qpid/client/BasicMessageConsumer.java 1306567 /trunk/qpid/java/systests/src/main/java/org/apache/qpid/test/unit/close/MessageConsumerCloseTest.java PRE-CREATION Diff: https://reviews.apache.org/r/4546/diff Testing ------- Thanks, Oleksandr
        Hide
        Robbie Gemmell added a comment -

        Rob and I both looked over the patch separately and think it seems ok, and it passes the tests fine locally. I have applied it to trunk and we will to subject it to CI over the weekend to give it fuller workout before asking for inclusion in 0.16.

        Show
        Robbie Gemmell added a comment - Rob and I both looked over the patch separately and think it seems ok, and it passes the tests fine locally. I have applied it to trunk and we will to subject it to CI over the weekend to give it fuller workout before asking for inclusion in 0.16.
        Hide
        Justin Ross added a comment -

        Reviewed by Robbie and Rob, as indicated in the comments above. Approved for 0.16.

        Show
        Justin Ross added a comment - Reviewed by Robbie and Rob, as indicated in the comments above. Approved for 0.16.
        Hide
        Robbie Gemmell added a comment -

        Changes merged to the 0.16 branch.

        Show
        Robbie Gemmell added a comment - Changes merged to the 0.16 branch.

          People

          • Assignee:
            Robbie Gemmell
            Reporter:
            Praveen Murugesan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development