[HBASE-18549] Unclaimed replication queues can go undetected - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0-alpha-1, 2.2.0, 1.4.8, 2.1.1
Component/s: Replication
Labels:
None

Hadoop Flags:

Reviewed

Description

We have come across this situation multiple times where a zookeeper issues can cause NodeFailoverWorker to fail picking up replication queue for a dead region server silently. One example is when the znode size for a particular queue exceed jute.maxBuffer value.

There can be other situations that may lead to this and just go undetected. We need to have a metric for number of unclaimed replication queues. This will help in mitigating the problem through alerting on the metric and identifying underlying issues.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-18549-.master.001.patch
23/Aug/18 07:12
10 kB
Xu Cang
HBASE-18549-.master.002.patch
23/Aug/18 19:38
13 kB
Xu Cang
HBASE-18549-.master.003.patch
23/Aug/18 23:05
12 kB
Xu Cang
HBASE-18549-.master.004.patch
29/Aug/18 22:50
12 kB
Xu Cang
HBASE-18549.branch-1.001.patch
29/Aug/18 23:40
10 kB
Xu Cang
HBASE-18549.branch-1.001.patch
04/Sep/18 21:15
10 kB
Xu Cang

Activity

People

Assignee:: Xu Cang

Reporter:: Ashu Pachauri

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 09/Aug/17 20:58

Updated:: 01/Feb/19 19:55

Resolved:: 02/Oct/18 01:51