Details
Description
Hello,
It seems that duplex network connector does not get reconnected properly after network failure and causes messages to be lost. Issue is similar to https://issues.jboss.org/browse/MB-385 and https://issues.apache.org/jira/browse/AMQ-1973.
Below is my description how to reproduce the issue - I could probably create JUnit test, but I'm unsure how to do this.
Setup
Lets assume that we have two hosts: hostA (192.168.0.1) and hostB (192.168.0.2).
hostA
Run brokerA with default configuration, which comes from installation package.
hostB
Run brokerB either embedded or standalone - I'm using embedded with following code:
BrokerService broker = new BrokerService(); NetworkConnector connector = broker.addNetworkConnector("static:(failover:tcp://192.168.0.1:61616?wireFormat.maxInactivityDuration=18000000)"); connector.setDuplex(true); connector.setConduitSubscriptions(false);
Startup
In brokerA log you should see following:
INFO | Connector vm://brokerA started
INFO | Started responder end of duplex bridge NC@ID:hostA-34744-1421847312144-0:1
INFO | Network connection between vm://brokerA#0 and tcp:///192.168.0.2:65463@61616 (brokerB) has been established.
In brokerB log you should see following
2015-01-21 14:35:12,824 INFO triggerStartAsyncNetworkBridgeCreation: remoteBroker=unconnected, localBroker= vm://brokerB#0 org.apache.activemq.network.DemandForwardingBridgeSupport - Network connection between vm://brokerB#0 and tcp://192.168.0.1:61616?wireFormat.maxInactivityDuration=18000000 (brokerA) has been established.
Test
On hostB run command to simulate network failure:
iptables -I OUTPUT -d 192.168.0.1 -p tcp --dport 61616 -j REJECT --reject-with=icmp-host-unreachable
Wait until you get on brokerB:
2015-01-21 14:51:52,735 WARN [ActiveMQ InactivityMonitor Worker] org.apache.activemq.transport.failover.FailoverTransport - Transport (tcp://192.168.0.1:61616) failed, reason: org.apache.activemq.transport.InactivityIOException: Channel was inactive for too (>30000) long: tcp://192.168.0.1:61616, attempting to automatically reconnect
and on brokerA:
WARN | Network connection between vm://brokerA#0 and tcp:///192.168.0.2:65463@61616 shutdown due to a remote error: org.
pache.activemq.transport.InactivityIOException: Channel was inactive for too (>30000) long: tcp://192.168.0.2:65463
INFO | Connector vm://brokerA stopped
INFO | brokerA bridge to brokerB stopped
On hostB simulate network back to normal:
iptables -D OUTPUT -d 192.168.0.1 -p tcp --dport 61616 -j REJECT --reject-with=icmp-host-unreachable
Outcome and summary
In brokerB log you should see following:
2015-01-21 14:52:03,588 INFO [ActiveMQ Task-3] org.apache.activemq.transport.failover.FailoverTransport - Successfully reconnected to tcp://192.168.0.1:61616?wireFormat.maxInactivityDuration=18000000
On brokerA there is no sign of restarting responder end of duplex bridge. Consumers connected to brokerA can't receive messages sent to brokerB.