Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
Reviewed
Description
We need a nice way of handling long network partitions without impacting a master cluster (which pushes the data). Currently it will just retry over and over again.
I think we could:
- Stop replication to a slave cluster if it didn't respond for more than 10 minutes
- Keep track of the duration of the partition
- When the slave cluster comes back, initiate a MR job like
HBASE-2221
Maybe we want less than 10 minutes, maybe we want this to be all automatic or just the first 2 parts. Discuss.
Attachments
Attachments
Issue Links
- is blocked by
-
HBASE-2707 Can't recover from a dead ROOT server if any exceptions happens during log splitting
- Closed
-
HBASE-2539 Cannot start ZK before the rest in tests anymore
- Closed
-
HBASE-2735 Make HBASE-2694 replication-friendly
- Closed
- is depended upon by
-
HBASE-2611 Handle RS that fails while processing the failure of another one
- Closed
- is related to
-
HBASE-2791 Stop dumping exceptions coming from ZK and do nothing about them
- Closed
-
HBASE-2809 Accounting of ReplicationSource's memory usage
- Closed
-
HBASE-2808 Document the implementation of replication
- Closed
-
HBASE-2810 Profiling of ReplicationSource to determine if it's better to reuse lists or not
- Closed
- requires
-
HBASE-2529 Make OldLogsCleaner easier to extend
- Closed
-
HBASE-2527 Add the ability to easily extend some HLog actions
- Closed
-
HBASE-2534 Recursive deletes and misc improvements to ZKW
- Closed