Qpid
  1. Qpid
  2. QPID-3214

Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.10
    • Fix Version/s: 0.11
    • Component/s: Java Client
    • Labels:
      None

      Description

      As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java
      This is a regression and was introduced in rev 985262

      Found one Java-level deadlock:
      =============================
      "IoReceiver - localhost/127.0.0.1:15672":
      waiting to lock monitor 0x0000002ac2ea3b70 (object 0x0000002ab70156b0, a java.lang.Object),
      which is held by "main"
      "main":
      waiting to lock monitor 0x0000002ac28db1b8 (object 0x0000002ab7048d70, a java.lang.Object),
      which is held by "IoReceiver - localhost/127.0.0.1:15672"

      Java stack information for the threads listed above:
      ===================================================
      "IoReceiver - localhost/127.0.0.1:15672":
      at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297)

      • waiting to lock<0x0000002ab70156b0> (a java.lang.Object)
        at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033)
      • locked<0x0000002ab7048d70> (a java.lang.Object)
        at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913)
        at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156)
        at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32)
        at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112)
        at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50)
        at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32)
        at org.apache.qpid.transport.Method.delegate(Method.java:159)
        at org.apache.qpid.transport.Session.received(Session.java:528)
        at org.apache.qpid.transport.Connection.dispatch(Connection.java:404)
        at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64)
        at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40)
        at org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110)
        at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112)
        at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54)
        at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40)
        at org.apache.qpid.transport.Method.delegate(Method.java:159)
        at org.apache.qpid.transport.Connection.received(Connection.java:369)
        at org.apache.qpid.transport.Connection.received(Connection.java:59)
        at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:95)
        at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:196)
        at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:129)
        at org.apache.qpid.transport.network.Frame.delegate(Frame.java:133)
        at org.apache.qpid.transport.network.Assembler.received(Assembler.java:100)
        at org.apache.qpid.transport.network.Assembler.received(Assembler.java:42)
        at org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:187)
        at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:103)
        at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:42)
        at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:128)
        at java.lang.Thread.run(Thread.java:619)
        "main":
        at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1025)
      • waiting to lock<0x0000002ab7048d70> (a java.lang.Object)
        at org.apache.qpid.client.BasicMessageConsumer_0_10.sendCancel(BasicMessageConsumer_0_10.java:193)
        at org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:573)
      • locked<0x0000002ab70156b0> (a java.lang.Object)
        at org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:535)
        at org.apache.qpid.client.AMQQueueBrowser.close(AMQQueueBrowser.java:102)
        at org.apache.qpid.test.client.QueueBrowserAutoAckTest.testFailoverWithQueueBrowser(QueueBrowserAutoAckTest.java:501)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at junit.framework.TestCase.runTest(TestCase.java:154)
        at junit.framework.TestCase.runBare(TestCase.java:127)
        at org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:234)
        at junit.framework.TestResult$1.protect(TestResult.java:106)
        at junit.framework.TestResult.runProtected(TestResult.java:124)
        at junit.framework.TestResult.run(TestResult.java:109)
        at junit.framework.TestCase.run(TestCase.java:118)
        at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:120)
        at junit.framework.TestSuite.runTest(TestSuite.java:208)
        at junit.framework.TestSuite.run(TestSuite.java:203)
        at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297)
        at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672)
        at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:546)

      Found 1 deadlock

      1. QPID-3214.patch
        0.7 kB
        Gordon Sim

        Activity

        Rajith Attapattu created issue -
        Hide
        Gordon Sim added a comment -

        Suggested fix. This simply takes the call to connection.exceptionReceived() outside the scope of the current_exception_lock in the session.

        Holding the session lock over that call would not in any case prevent other sessions calling in on the connection, so its not clear what value there is in doing so.

        Show
        Gordon Sim added a comment - Suggested fix. This simply takes the call to connection.exceptionReceived() outside the scope of the current_exception_lock in the session. Holding the session lock over that call would not in any case prevent other sessions calling in on the connection, so its not clear what value there is in doing so.
        Gordon Sim made changes -
        Field Original Value New Value
        Attachment QPID-3214.patch [ 12476886 ]
        Hide
        Gordon Sim added a comment -

        AMQSession_0_10.java seems to be duplicating in part what the transport.Session is doing (i.e. holding exception). In the trace above the threads are each trying to set the current exception on the session to the same thing I believe. My suggested 'fix' btw is a workaround of the deadlock only. It doesn't try to untangle the code.

        Show
        Gordon Sim added a comment - AMQSession_0_10.java seems to be duplicating in part what the transport.Session is doing (i.e. holding exception). In the trace above the threads are each trying to set the current exception on the session to the same thing I believe. My suggested 'fix' btw is a workaround of the deadlock only. It doesn't try to untangle the code.
        Hide
        Rajith Attapattu added a comment -

        I am actually keen to fix the exception handling part rather than addressing the deadlock directly.
        I agree with Gordon that the exception handling code could be improved a bit.
        I am hoping to work out a patch soon.

        Show
        Rajith Attapattu added a comment - I am actually keen to fix the exception handling part rather than addressing the deadlock directly. I agree with Gordon that the exception handling code could be improved a bit. I am hoping to work out a patch soon.
        Hide
        Rajith Attapattu added a comment - - edited

        After looking at the code, I believe the exception handling is not exactly correct.

        1. For starters we shouldn't really notify the connection's exception listener when there is a session level error. If we avoid that we can certainly get rid of the deadlock. The JMS API doc clearly states,
        "If a JMS provider detects a serious problem with a Connection object, it informs the Connection object's ExceptionListener, if one has been registered. It does this by calling the listener's onException method, passing it a JMSException argument describing the problem. "

        So certainly execution exceptions which only invalidates the session should not be notified via the exception listener.

        2. Going further if one looks at the AMQConnection.java exceptionReceived() method, it again looks wrong to me.
        It seems that an execution exception can in fact close the underlying connection (and the rest of the sessions) due to the following piece of logic.

        if (hardError(cause))
        {
        closer = (!_closed.getAndSet(true)) || closer;

        { _logger.info("Closing AMQConnection due to :" + cause); }

        }

        Digging further for any AMQException hardError method will return true.
        Therefore we need to rework that piece of code.

        3. The 0-10 code client (if using the Java IO transport) has a separate code path to notify the Connection if a connection level error is there.
        Therefore we definitely do not need to notify anything from the session.

        I believe the MINA transport notifies the IOException etc via the ProtocolSession, perhaps the reason for this code in the first place.
        For now I believe we need to do two things.

        1. Remove the following line from AMQSession_0_10.setCurrentException(SessionException se) method
        _connection.exceptionReceived(amqe);

        2. Fix the AMQConnection.isHardError method and the AMQException.isHardError method.

        Show
        Rajith Attapattu added a comment - - edited After looking at the code, I believe the exception handling is not exactly correct. 1. For starters we shouldn't really notify the connection's exception listener when there is a session level error. If we avoid that we can certainly get rid of the deadlock. The JMS API doc clearly states, "If a JMS provider detects a serious problem with a Connection object, it informs the Connection object's ExceptionListener, if one has been registered. It does this by calling the listener's onException method, passing it a JMSException argument describing the problem. " So certainly execution exceptions which only invalidates the session should not be notified via the exception listener. 2. Going further if one looks at the AMQConnection.java exceptionReceived() method, it again looks wrong to me. It seems that an execution exception can in fact close the underlying connection (and the rest of the sessions) due to the following piece of logic. if (hardError(cause)) { closer = (!_closed.getAndSet(true)) || closer; { _logger.info("Closing AMQConnection due to :" + cause); } } Digging further for any AMQException hardError method will return true. Therefore we need to rework that piece of code. 3. The 0-10 code client (if using the Java IO transport) has a separate code path to notify the Connection if a connection level error is there. Therefore we definitely do not need to notify anything from the session. I believe the MINA transport notifies the IOException etc via the ProtocolSession, perhaps the reason for this code in the first place. For now I believe we need to do two things. 1. Remove the following line from AMQSession_0_10.setCurrentException(SessionException se) method _connection.exceptionReceived(amqe); 2. Fix the AMQConnection.isHardError method and the AMQException.isHardError method.
        Hide
        Rajith Attapattu added a comment -

        I used review board to post the patch just to get the hang of it.
        https://reviews.apache.org/r/642/

        Show
        Rajith Attapattu added a comment - I used review board to post the patch just to get the hang of it. https://reviews.apache.org/r/642/
        Hide
        Rajith Attapattu added a comment -

        Committed Gordon's suggested fix.
        It seems for some situations the exception listener is the only way to get notified about a session exception.
        However there was a bug which prevented exceptions being notified in some cases(see QPID-3233) and has since been fixed.
        Therefore I believe Gordon's suggestion is probably the correct way to go.

        Show
        Rajith Attapattu added a comment - Committed Gordon's suggested fix. It seems for some situations the exception listener is the only way to get notified about a session exception. However there was a bug which prevented exceptions being notified in some cases(see QPID-3233 ) and has since been fixed. Therefore I believe Gordon's suggestion is probably the correct way to go.
        Hide
        Rajith Attapattu added a comment -

        Fixed. Test cases that were previously deadlocking due to this issue have been working properly since the fix was committed.

        Show
        Rajith Attapattu added a comment - Fixed. Test cases that were previously deadlocking due to this issue have been working properly since the fix was committed.
        Rajith Attapattu made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Rajith Attapattu
            Reporter:
            Rajith Attapattu
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development