Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
3.0.0-alpha-1, 2.2.3, 2.2.4, 2.2.5
-
None
-
None
Description
How to reproduce:
- user starts an HBase cluster on top of a file system
- user performs some operations and shuts down the cluster, all the data are still persisted in the file system
- user creates a new HBase cluster using a different set of servers on top of the same file system with the same root directory
- HMaster cannot initialize
Root cause:
During HMaster initialization phase, the following happens:
- HMaster waits for namespace table online
- AssignmentManager gets all namespace table regions info
- region servers of namespace table are already dead, online check fails
- HMaster waits for namespace regions online, keep retrying for 1000 times which means forever
Code waiting for namespace table to be online: https://github.com/apache/hbase/blob/rel/2.2.3/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1102
Stack trace (running on S3):
2020-04-23 08:15:57,185 WARN [master/ip-10-12-13-14:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1587628169070.d34b65b91a52644ed3e77c5fbb065c2b. is NOT online; state={d34b65b91a52644ed3e77c5fbb065c2b state=OPEN, ts=1587629742129, server=ip-10-12-13-14.ec2.internal,16020,1587628031614}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
where ip-10-12-13-14.ec2.internal is the old region server hosting the region of hbase:namespace.
Discussion for the fix
We see there is a fix for this at branch-3: https://issues.apache.org/jira/browse/HBASE-21154. Before we provide a patch, we would like to know from the community if we should backport this change to branch-2, or if we should just perform a fix with minimum code change.
Attachments
Issue Links
- is blocked by
-
HBASE-24833 Bootstrap should not delete the META table directory if it's not partial
- Resolved
- relates to
-
HBASE-27044 Serialized procedures which point to users from other Kerberos domains can prevent master startup
- Open
-
HBASE-25587 [hbck2] Schedule SCP for all unknown servers
- Resolved
-
HBASE-26193 Do not store meta region location as permanent state on zookeeper
- Resolved
- links to