[HBASE-20632] Failure of RSes belonging to RSgroup for System tables makes the cluster unavailable - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Abandoned
Affects Version/s: 3.0.0-alpha-1
Fix Version/s: None
Component/s: master, regionserver
Labels:
- rsgroup

Description

This was done on a local cluster (non hdfs) and following are the steps

Start a single node cluster and start an additional RS using local-regionservers.sh
Through hbase shell add a new rs group

hbase(main):001:0> add_rsgroup 'test_rsgroup'
Took 0.5503 seconds
hbase(main):002:0> list_rsgroups
NAME SERVER / TABLE
test_rsgroup
default server dob2-r3n13:16020
server dob2-r3n13:16022
table hbase:meta
table hbase:acl
table hbase:quota
table hbase:namespace
table hbase:rsgroup
2 row(s)
Took 0.0419 seconds

Move one of the region servers to the new rsgroup

hbase(main):004:0> move_servers_rsgroup 'test_rsgroup',['dob2-r3n13:16020']
Took 6.4894 seconds
hbase(main):005:0> exit

Stop the regionserver which is left in the default rsgroup
```
local-regionservers.sh stop 2
```

The cluster becomes unusable even if the region server is restarted or even if all the services were brought down and brought up.

In 1.1.x version, the cluster recovers fine. Looks like meta is assigned to a dummy regionserver and when the regionserver gets restarted it gets assigned. The following is what we can see in master UI when the rs is down

1588230740	hbase:meta,,1.1588230740 state=PENDING_OPEN, ts=Wed May 23 18:24:01 EDT 2018 (1s ago), server=localhost,1,1

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

20632.v1.txt
24/May/18 01:30
1 kB
Ted Yu

Activity

People

Assignee:: Unassigned

Reporter:: Biju Nair

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 23/May/18 23:06

Updated:: 12/Jun/22 17:33

Resolved:: 12/Jun/22 17:33