Details
Description
We started using durable subscribers a week ago, and after the 4th durable subscriber unsubscribed (due to 1 hour inactivity), the system deadlocked. If the "Durable Subscriber Cleanup Timer" goes of at the wrong time, your entire server locks up.
Setup:
Active MQ 5.7.0 with master/slave using JDBC store
Approx 3 - 5 concurrent durable subscribers
Approx 5 messages / sec
Active MQ checks every 1 minute for subscribers which have been offline for 1 hour.
Locked threads:
"ActiveMQ Transport: tcp:///79.125.71.104:48082@8090":
at org.apache.activemq.broker.region.Topic.doMessageSend(Topic.java:446)
- waiting to lock <0x00000000c6228480> (a org.apache.activemq.broker.region.Topic)
at org.apache.activemq.broker.region.Topic.send(Topic.java:427)
at org.apache.activemq.broker.region.AbstractRegion.send(AbstractRegion.java:407)
at org.apache.activemq.broker.region.RegionBroker.send(RegionBroker.java:503)
"ActiveMQ Transport: tcp:///79.125.71.104:47590@8090":
at org.apache.activemq.broker.region.PrefetchSubscription.add(PrefetchSubscription.java:142)
- waiting to lock <0x00000000c66ba050> (a java.lang.Object)
at org.apache.activemq.broker.region.DurableTopicSubscription.add(DurableTopicSubscription.java:243)
at org.apache.activemq.broker.region.policy.StrictOrderDispatchPolicy.dispatch(StrictOrderDispatchPolicy.java:58)
"ActiveMQ Durable Subscriber Cleanup Timer":
at org.apache.activemq.broker.region.Topic.deactivate(Topic.java:288)
- waiting to lock <0x00000000c6250670> (a java.util.concurrent.CopyOnWriteArrayList)
at org.apache.activemq.broker.region.DurableTopicSubscription.deactivate(DurableTopicSubscription.java:184) - locked <0x00000000c66ba060> (a java.lang.Object)
- locked <0x00000000c66ba050> (a java.lang.Object)
at org.apache.activemq.broker.region.Topic.deleteSubscription(Topic.java:195)
at org.apache.activemq.broker.region.TopicRegion.removeSubscription(TopicRegion.java:199)
at org.apache.activemq.broker.region.TopicRegion.doCleanup(TopicRegion.java:99)
I have attached a patch which fixes the problem.
Since there is only one dispatch policy per Topic, synchronisation can happen on the DispatchPolicy instead of on the consumers object which causes the deadlock.