[HBASE-12770] Don't transfer all the queued hlogs of a dead server to the same alive server - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.4.0, 2.0.0
Fix Version/s: 1.4.0, 1.3.1, 2.0.0
Component/s: Replication
Labels:
None

Hadoop Flags:

Reviewed

Description

When a region server is down(or the cluster restart), all the hlog queues will be transferred by the same alive region server. In a shared cluster, we might create several peers replicating data to different peer clusters. There might be lots of hlogs queued for these peers caused by several reasons, such as some peers might be disabled, or errors from peer cluster might prevent the replication, or the replication sources may fail to read some hlog because of hdfs problem. Then, if the server is down or restarted, another alive server will take all the replication jobs of the dead server, this might bring a big pressure to resources(network/disk read) of the alive server and also is not fast enough to replicate the queued hlogs. And if the alive server is down, all the replication jobs including that takes from other dead servers will once again be totally transferred to another alive server, this might cause a server have a large number of queued hlogs(in our shared cluster, we find one server might have thousands of queued hlogs for replication). As an optional way, is it reasonable that the alive server only transfer one peer's hlogs from the dead server one time? Then, other alive region servers might have the opportunity to transfer the hlogs of rest peers. This may also help the queued hlogs be processed more fast. Any discussion is welcome.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-12770-branch-1-v1.patch
04/Aug/16 11:33
27 kB
Phil Yang
HBASE-12770-branch-1-v2.patch
04/Aug/16 12:19
27 kB
Phil Yang
HBASE-12770-branch-1-v3.patch
08/Aug/16 06:13
28 kB
Phil Yang
HBASE-12770-branch-1-v3.patch
08/Aug/16 05:04
28 kB
Phil Yang
HBASE-12770-branch-1-v3.patch
05/Aug/16 06:21
28 kB
Phil Yang
HBASE-12770-branch-1-v3.patch
05/Aug/16 03:54
28 kB
Phil Yang
HBASE-12770-trunk.patch
07/Jan/15 12:07
20 kB
Jianwei Cui
HBASE-12770-v1.patch
21/Jul/16 09:36
37 kB
Phil Yang
HBASE-12770-v2.patch
04/Aug/16 12:17
37 kB
Phil Yang
HBASE-12770-v3.patch
08/Aug/16 02:28
37 kB
Phil Yang
HBASE-12770-v3.patch
05/Aug/16 03:22
37 kB
Phil Yang

Issue Links

breaks

HBASE-26482 HMaster may clean wals that is replicating in rare cases

Closed

duplicates

HBASE-16581 Optimize Replication queue transfers after server fail over

Closed

Activity

People

Assignee:: Phil Yang

Reporter:: Jianwei Cui

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 29/Dec/14 14:25

Updated:: 17/Jun/22 18:41

Resolved:: 08/Aug/16 08:37