|
|
|
I think your solution is bad. How can I stop Master? Stopping working master means that my business process will crashed... What's the problem if it is just a network problem, not server crashing? In this case slave must automaticaly reconnect to master and replicate state, it mustn't shutdown.
Data replication must be independent from Command architecture... Propogating commands to slave is bad architecture. In fact, absense of command means absence of data replication. to AMQ-1825, I am not sure this patch will solve your problems. i haven't run into those.
As to my patch, if you don' t use the shutdownOnSlaveFailure, everything will work the same as without the patch. Stopping one master is ok when you configure a network of pair of master/slave (eg, 3 pairs) and have redelivery policy set up properly so when some error happens on one pair, all the messages will be sent to another pair of master/slave. The way we want to have this is to make sure master and slave are in sync. No. I want to use the same configuration as yours: Datacenter1 (Master A / Slave B) <=> Datacenter2(Master B / Slave A)
But if you shutdown master which loose connection to slave that means some messages which is contained by current master won't relocated to another master in time. I think better solution is full replication of master's sate at the moment of attaching slave to master. Also slave musn't shutdown on loosing connection to master, becasue it's possible network problem. Slave must retry to reconnect to master in current of hour, for example, and then shutdown. It's too hard to our operations team restart ActiveMQ server each time when network is down. I think master must keep working when slave is disconnected. I derrived slave BrokerService and added reconnection logic. Problem for me is master state is not fully replicated on attaching slave. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
AMQ-596,AMQ-1832andAMQ-1820so that the network of pure master/slave pair topology can be reliable for use in a production environment. The common thing of these issues is we want to make sure that master/slave are always in sync regardless of which one starts first.Please take a look at the patch I provide. I hope that it can be incorporated into the source so it will be easy to maintain for us. Please also advise if there is a better to code to fix these issues. I will like to contribute to fix these.
NOTE: For
AMQ-1820, the fix was found by another user and I simply incorporate it in this patch. From the testing, without the change in AbstractRegion.java, 1820 exception still happens. That one line change seems a fix.