History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: AMQ-1832
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Rob Davies
Reporter: ying
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
ActiveMQ

Pure Master/Slave to allow shutdownOnSlaveFailure to be configured on master

Created: 30/Jun/08 12:55 PM   Updated: Tuesday 04:49 AM
Component/s: None
Affects Version/s: None
Fix Version/s: 5.2.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works AMQ1832-596-1820Patch.txt 2008-07-11 10:24 AM ying 9 kb


 Description  « Hide
It is related to AMQ-596 in terms that it is desirable that master and slave are in synch and when both are down, failover can direct the job to other pair of master/slave. So pure master/slave can be used for replication while a network of pair (master/slave) will provide a HA service.

 All   Comments   Work Log   Change History   Subversion Commits   FishEye   Crucible      Sort Order: Ascending order - Click to sort in descending order
ying - 11/Jul/08 10:24 AM
I was working on issues AMQ-596, AMQ-1832 and AMQ-1820 so that the network of pure master/slave pair topology can be reliable for use in a production environment. The common thing of these issues is we want to make sure that master/slave are always in sync regardless of which one starts first.

Please take a look at the patch I provide. I hope that it can be incorporated into the source so it will be easy to maintain for us. Please also advise if there is a better to code to fix these issues. I will like to contribute to fix these.

NOTE: For AMQ-1820, the fix was found by another user and I simply incorporate it in this patch. From the testing, without the change in AbstractRegion.java, 1820 exception still happens. That one line change seems a fix.


Dima - 12/Jul/08 02:14 AM - edited
2ying

Do you think your patch can solve my problems AMQ-1825 ?


Dima - 12/Jul/08 02:31 AM
I think your solution is bad. How can I stop Master? Stopping working master means that my business process will crashed... What's the problem if it is just a network problem, not server crashing? In this case slave must automaticaly reconnect to master and replicate state, it mustn't shutdown.

Data replication must be independent from Command architecture... Propogating commands to slave is bad architecture. In fact, absense of command means absence of data replication.


Dima - 12/Jul/08 02:38 AM - edited
I think PersistenceStore must be master and has slave, not broker itself

ying - 14/Jul/08 06:12 AM
to AMQ-1825, I am not sure this patch will solve your problems. i haven't run into those.

As to my patch, if you don' t use the shutdownOnSlaveFailure, everything will work the same as without the patch.

Stopping one master is ok when you configure a network of pair of master/slave (eg, 3 pairs) and have redelivery policy set up properly so when some error happens on one pair, all the messages will be sent to another pair of master/slave. The way we want to have this is to make sure master and slave are in sync.


Dima - 14/Jul/08 12:55 PM - edited
No. I want to use the same configuration as yours: Datacenter1 (Master A / Slave B) <=> Datacenter2(Master B / Slave A)
But if you shutdown master which loose connection to slave that means some messages which is contained by current master won't relocated to another master in time.

I think better solution is full replication of master's sate at the moment of attaching slave to master. Also slave musn't shutdown on loosing connection to master, becasue it's possible network problem. Slave must retry to reconnect to master in current of hour, for example, and then shutdown.

It's too hard to our operations team restart ActiveMQ server each time when network is down.
Unfortunely, network problems happen very often. Sometimes electricity is down in all datacenter...

I think master must keep working when slave is disconnected. I derrived slave BrokerService and added reconnection logic. Problem for me is master state is not fully replicated on attaching slave.
I think, shutdowning master is bad solution.
Crashing of both hardrives of both severs is less probable


Dima - 15/Jul/08 09:36 AM
I was wrong about reconnection. I can wrap tcp trasnport by failover one. But unfourtunely it doesn't help with state synchronization

Rob Davies - 02/Sep/08 04:49 AM
Patch applied by SVN revision 691206