ActiveMQ
  1. ActiveMQ
  2. AMQ-2320

Session are not deleted when several processes terminate simultaneously in a "network of broker" complex configuration

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Incomplete
    • Affects Version/s: 5.2.0
    • Fix Version/s: 5.x
    • Component/s: Broker
    • Labels:
      None
    • Environment:

      Linux Redhat. JVM 1.6. AMQ 5.2

      Description

      I would like to add a schema. I hope I will be able to add it later. I don't see it

      I have 3 processes with one embedded broker (BSV, client, server) in the same network of brokers (non duplex but bi-directionnal).
      I have a fourth standalone broker
      My "client" and my "server" process are connected a second time with a "manual" tcp connection to the BSV process.
      My "client" and my "server" process are connected a third time with a "manual" tcp connection on the standalone broker.

      In this case, if my process BSV is stopped by an interrupt (CTRL-C) (which closes the embedded connection, and the embedded broker), and if simultaneously, I want to stop gracefully my "client" process (close all session, close all "manual" connection, stop embedded broker), one session thread stays up and never terminates

      "ActiveMQ Session: ID:td0sib01s.priv.atos.fr-51590-1247070640728-0:2:3" prio=10 tid=0x000000000e83a400 nid=0x5749 in Object.wait() [0x00000000469c8000..0x00000000469c8d10]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00002aaaca868380> (a java.lang.Object)
        at java.lang.Object.wait(Object.java:502)
        at org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:105)
      • locked <0x00002aaaca868380> (a java.lang.Object)
        at org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)

      It is a session associated with the BSV link (I don't know if it is a session associated with the network of broker, or a session associated with my manual connection)

      A CTRL-C on the process terminates the process

      It doesn't happen if stop are not simultaneously done.

      Eric-AWL

      1. Pb AMQ Session.JPG
        49 kB
        Eric
      2. screenshot-1.jpg
        320 kB
        Eric

        Activity

        Hide
        Eric added a comment - - edited

        The schema. All 4 processes are on the same server.

        I think I forgot to say that the session stays up in the "client" process when BSV and client want to stop simultaneously.

        Show
        Eric added a comment - - edited The schema. All 4 processes are on the same server. I think I forgot to say that the session stays up in the "client" process when BSV and client want to stop simultaneously.
        Hide
        Gary Tully added a comment -

        can you verify your scenario with the current 5.3-SNAPSHOT?

        The session thread is associated with one of the vm or tcp 'manual' connections, it does not occur in the network connector logic.
        Are you using finally blocks to close off connections and sessions?
        The trick here is trying to reproduce this.
        There is a disconnect test org.apache.activemq.usecases.BrokerQueueNetworkWithDisconnectTest[1] that uses a socket proxy to simulate a broker going down, this could provide a basis for a junit test that could help reproduce but it will be tricky.
        [1] http://svn.apache.org/repos/asf/activemq/trunk/activemq-core/src/test/java/org/apache/activemq/usecases/BrokerQueueNetworkWithDisconnectTest.java

        Show
        Gary Tully added a comment - can you verify your scenario with the current 5.3-SNAPSHOT? The session thread is associated with one of the vm or tcp 'manual' connections, it does not occur in the network connector logic. Are you using finally blocks to close off connections and sessions? The trick here is trying to reproduce this. There is a disconnect test org.apache.activemq.usecases.BrokerQueueNetworkWithDisconnectTest [1] that uses a socket proxy to simulate a broker going down, this could provide a basis for a junit test that could help reproduce but it will be tricky. [1] http://svn.apache.org/repos/asf/activemq/trunk/activemq-core/src/test/java/org/apache/activemq/usecases/BrokerQueueNetworkWithDisconnectTest.java
        Hide
        Eric added a comment -

        Hi !

        I didn't make the tests you asked me. But I have some news.

        I used ConsumerEventSource objects to track consumers activity on JMS destinations. And I found that I didn't manage correctly the stop() calls to these objects.

        I corrected my code. I think now that all my JMS ressources (consumers,producers, sessions, connections), brokers, and now ConsumerEventSource objects, are stopped and closed when I ask my processes to disconnect from my brokers correctly, or when my connection exceptionListeners discover that my network connectivity is broken (in this case, JMSException thrown by JMS calls to close() are ignored).

        In these conditions, it seems that all ActiveMQ Session threads are correctly stopped without having to interrupt them. I have to do more tests to confirm this.

        I don't know if this is the waited behaviour or not.

        Eric-AWL

        Show
        Eric added a comment - Hi ! I didn't make the tests you asked me. But I have some news. I used ConsumerEventSource objects to track consumers activity on JMS destinations. And I found that I didn't manage correctly the stop() calls to these objects. I corrected my code. I think now that all my JMS ressources (consumers,producers, sessions, connections), brokers, and now ConsumerEventSource objects, are stopped and closed when I ask my processes to disconnect from my brokers correctly, or when my connection exceptionListeners discover that my network connectivity is broken (in this case, JMSException thrown by JMS calls to close() are ignored). In these conditions, it seems that all ActiveMQ Session threads are correctly stopped without having to interrupt them. I have to do more tests to confirm this. I don't know if this is the waited behaviour or not. Eric-AWL
        Hide
        Eric added a comment -

        Hi !

        I believed I found the major reason of the active threads. But, no, ....
        In the same conditions (simultaneous stops) I had a case where one Session thread was active again (like before I discover that I forgot to stop the consumerEventSource object).

        And I had a case where the consumerEventSource.stop was blocked without deadlock detected, and where CTRL-C doesn't interrupt it. Only Kill -9 interrupted the process.

        Here is the stack trace of my shutdown hook which is called on a CTRL-C interruption.

        Name: Thread-1
        State: WAITING on java.lang.Object@9866417
        Total blocked: 0 Total waited: 1

        Stack trace:
        java.lang.Object.wait(Native Method)
        org.apache.activemq.thread.DedicatedTaskRunner.shutdown(DedicatedTaskRunner.java:72)
        org.apache.activemq.thread.DedicatedTaskRunner.shutdown(DedicatedTaskRunner.java:83)
        org.apache.activemq.ActiveMQSessionExecutor.stop(ActiveMQSessionExecutor.java:142)
        org.apache.activemq.ActiveMQSession.dispose(ActiveMQSession.java:579)

        • locked org.apache.activemq.ActiveMQSession@22cf38a2
          org.apache.activemq.ActiveMQSession.close(ActiveMQSession.java:555)
          org.apache.activemq.advisory.ConsumerEventSource.stop(ConsumerEventSource.java:83)
          atosbus.transport.jms.CJMSTransport$CAMQConsumerEventServiceLink.stop(CJMSTransport.java:190)
          atosbus.transport.jms.CJMSTransport.unregisterService(CJMSTransport.java:300)
        • locked atosbus.transport.jms.CJMSTransport@6a3b8b49
          atosbus.core.service.CService.setServiceState(CService.java:1044)
          atosbus.core.service.CService.releaseAllResources(CService.java:2137)
          atosbus.core.system.CManagedComponent.removeService(CManagedComponent.java:465)
        • locked atosbus.core.service.CService@40363068
        • locked atosbus.core.system.CInstance@2143ed74
          atosbus.core.system.CManagedComponent.removeAllServices(CManagedComponent.java:513)
          atosbus.core.system.CInstance.removeAllApplicativeServices(CInstance.java:433)
          atosbus.core.system.CInstance.disconnectBus(CInstance.java:296)
          atosbus.core.system.CComponent$CComponentShutdownHook.run(CComponent.java:159)

        And I put here an image of the JMX console thread wiew which shows that there is no deadlock .

        Eric-AWL

        Show
        Eric added a comment - Hi ! I believed I found the major reason of the active threads. But, no, .... In the same conditions (simultaneous stops) I had a case where one Session thread was active again (like before I discover that I forgot to stop the consumerEventSource object). And I had a case where the consumerEventSource.stop was blocked without deadlock detected, and where CTRL-C doesn't interrupt it. Only Kill -9 interrupted the process. Here is the stack trace of my shutdown hook which is called on a CTRL-C interruption. Name: Thread-1 State: WAITING on java.lang.Object@9866417 Total blocked: 0 Total waited: 1 Stack trace: java.lang.Object.wait(Native Method) org.apache.activemq.thread.DedicatedTaskRunner.shutdown(DedicatedTaskRunner.java:72) org.apache.activemq.thread.DedicatedTaskRunner.shutdown(DedicatedTaskRunner.java:83) org.apache.activemq.ActiveMQSessionExecutor.stop(ActiveMQSessionExecutor.java:142) org.apache.activemq.ActiveMQSession.dispose(ActiveMQSession.java:579) locked org.apache.activemq.ActiveMQSession@22cf38a2 org.apache.activemq.ActiveMQSession.close(ActiveMQSession.java:555) org.apache.activemq.advisory.ConsumerEventSource.stop(ConsumerEventSource.java:83) atosbus.transport.jms.CJMSTransport$CAMQConsumerEventServiceLink.stop(CJMSTransport.java:190) atosbus.transport.jms.CJMSTransport.unregisterService(CJMSTransport.java:300) locked atosbus.transport.jms.CJMSTransport@6a3b8b49 atosbus.core.service.CService.setServiceState(CService.java:1044) atosbus.core.service.CService.releaseAllResources(CService.java:2137) atosbus.core.system.CManagedComponent.removeService(CManagedComponent.java:465) locked atosbus.core.service.CService@40363068 locked atosbus.core.system.CInstance@2143ed74 atosbus.core.system.CManagedComponent.removeAllServices(CManagedComponent.java:513) atosbus.core.system.CInstance.removeAllApplicativeServices(CInstance.java:433) atosbus.core.system.CInstance.disconnectBus(CInstance.java:296) atosbus.core.system.CComponent$CComponentShutdownHook.run(CComponent.java:159) And I put here an image of the JMX console thread wiew which shows that there is no deadlock . Eric-AWL
        Hide
        Eric added a comment -

        No deadlock, but ConsumerEventSource is blocked into a shutdown hook.

        Show
        Eric added a comment - No deadlock, but ConsumerEventSource is blocked into a shutdown hook.
        Hide
        Eric added a comment -

        I found the lock. It wasn't a deadlock detected by JVM, but something similar in my code. (I received a consumerEvent which tried to acquire a lock on a object, I locked in the the thread which launched the consumerEvent.stop() and wait that the event was consumed).

        But the first problem is always here (One session which doesn't want to stop).

        Is it possible that this is the same problem as AMQ-2327 ?

        I have a circular references configuration, and use networkTTL =1, conduitSubscription=false, dynamicOnly=true to avoid problems.

        I will try with 5.3-SNAPSHOT (the last one).

        Eric-AWL

        Show
        Eric added a comment - I found the lock. It wasn't a deadlock detected by JVM, but something similar in my code. (I received a consumerEvent which tried to acquire a lock on a object, I locked in the the thread which launched the consumerEvent.stop() and wait that the event was consumed). But the first problem is always here (One session which doesn't want to stop). Is it possible that this is the same problem as AMQ-2327 ? I have a circular references configuration, and use networkTTL =1, conduitSubscription=false, dynamicOnly=true to avoid problems. I will try with 5.3-SNAPSHOT (the last one). Eric-AWL
        Hide
        Eric added a comment -

        I used Fuse version 5.3.0.4 where the AMQ-2327 problem is corrected.

        I always have some sessions which are up .... I found a way to reproduce the problem but it's not easy to make a smaller test configuration.

        I have written a workaround by calling interrupt() on these threads at the end of my program. But it's not very clean !!! (it works well)

        I will try to find time to make the JUnit test..

        Eric-AWL

        Show
        Eric added a comment - I used Fuse version 5.3.0.4 where the AMQ-2327 problem is corrected. I always have some sessions which are up .... I found a way to reproduce the problem but it's not easy to make a smaller test configuration. I have written a workaround by calling interrupt() on these threads at the end of my program. But it's not very clean !!! (it works well) I will try to find time to make the JUnit test.. Eric-AWL
        Hide
        Gary Tully added a comment -
        Show
        Gary Tully added a comment - good that it works fyi: for inspiration on a unit test; have a look at: http://svn.apache.org/viewvc/activemq/trunk/activemq-core/src/test/java/org/apache/activemq/usecases/ThreeBrokerQueueNetworkTest.java?view=co
        Hide
        Timothy Bish added a comment -

        No test was ever provided and no other reports of this so far.

        Show
        Timothy Bish added a comment - No test was ever provided and no other reports of this so far.

          People

          • Assignee:
            Unassigned
            Reporter:
            Eric
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development