Description
java.lang.AssertionError: Delivering count of a source queue should be zero on connection failure expected:<0> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.activemq.artemis.tests.integration.cluster.bridge.BridgeReconnectTest.testDeliveringCountOnBridgeConnectionFailure(BridgeReconnectTest.java:688)
18:25:43,722 WARN [org.apache.activemq.artemis.core.server] AMQ222094: Bridge unable to send message Reference[22]:NON-RELIABLE:ServerMessage[messageID=22,durable=false,userID=null,priority=4, bodySize=79, timestamp=Tue Feb 02 18:25:43 EST 2016,expiration=0, durable=false, address=testAddress,properties=TypedProperties[propkey=18,_AMQ_BRIDGE_DUP=[47A6 779A CA04 11E5 9C91 A169 FCCB 5522 0000 0000 0000 0016)]]@1861263739, will try again once bridge reconnects: ActiveMQObjectClosedException[errorType=OBJECT_CLOSED message=AMQ119018: Producer is closed] at org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.checkClosed(ClientProducerImpl.java:298) [artemis-core-client-1.1.0.wildfly-012.jar:] at org.apache.activemq.artemis.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:122) [artemis-core-client-1.1.0.wildfly-012.jar:] at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.deliverStandardMessage(BridgeImpl.java:698) [artemis-server-1.1.0.wildfly-012.jar:] at org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.handle(BridgeImpl.java:574) [artemis-server-1.1.0.wildfly-012.jar:] at org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:2410) [artemis-server-1.1.0.wildfly-012.jar:] at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:1813) [artemis-server-1.1.0.wildfly-012.jar:] at org.apache.activemq.artemis.core.server.impl.QueueImpl.access$1400(QueueImpl.java:97) [artemis-server-1.1.0.wildfly-012.jar:] at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:2581) [artemis-server-1.1.0.wildfly-012.jar:] at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:100) [artemis-core-client-1.1.0.wildfly-012.jar:] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) [rt.jar:1.8.0-internal] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [rt.jar:1.8.0-internal] at java.lang.Thread.run(Thread.java:785) [vm.jar:1.8.0-internal]
I've investigated this issue and I found the race condition which causes mentioned fail. Problem lies in [1]. When bridge detects some problem, it calls connectionFailed method which call for every message in refs the Queue.cancel(ref, timeBase). Queue.cancel decreases deliveryCount for canceled message. However before this step, we remove reference on actual message from refs on line 20, so for this message the deliveryCount is not decreased. This is correct behavior, because for this message we return HandleStatus.BUSY. I think that problem is in QueueImpl#deliver method. If bridge returns HandleStatus.BUSY we should decrease deliveryCount. So I think that instead of [2], there should be [3].
[1]
private BridgeImpl#HandleStatus deliverStandardMessage(SimpleString dest, final MessageReference ref, ServerMessage message) { // if we failover during send then there is a chance that the // that this will throw a disconnect, we need to remove the message // from the acks so it will get resent, duplicate detection will cope // with any messages resent if (ActiveMQServerLogger.LOGGER.isTraceEnabled()) { ActiveMQServerLogger.LOGGER.trace("going to send message: " + message + " from " + this.getQueue()); } try { producer.send(dest, message); } catch (final ActiveMQException e) { ActiveMQServerLogger.LOGGER.bridgeUnableToSendMessage(e, ref); synchronized (refs) { // We remove this reference as we are returning busy which means the reference will never leave the Queue. // because of this we have to remove the reference here refs.remove(message.getMessageID()); } connectionFailed(e, false); return HandleStatus.BUSY; } return HandleStatus.HANDLED; }
[2]
else if (status == HandleStatus.BUSY) { holder.iter.repeat(); noDelivery++; }
[3]
else if (status == HandleStatus.BUSY) { decDelivering(); holder.iter.repeat(); noDelivery++; }
Steps to reproduce:
1. cd tests
2. while true; do mvn -Dtest=BridgeReconnectTest#testDeliveringCountOnBridgeConnectionFailure -Ptests -DfailIfNoTests=false test; done