Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.7.1
-
None
-
None
-
Reviewed
Description
By enabling configĀ hbase.assignment.usezk.migrating, we initiate the transition of HBase 1.x cluster from default ZK-based region assignment to ZK-less region assignments. Once the migration is enabled, any subsequent region transition is going to add two additional CQs in meta: info:sn and info:state. The workflow that adds new CQs in meta should be the only workflow reading it (unless it requires coordination among multiple workflows), however that is not the case here. Reading info:sn and info:state to rebuild user region states in RegionStateStore data structure is a hidden bug because it doesn't restrict the usage for only ZK-less region assignment.
What are the effects?
After enabling ZK-less migration, if we revert it back, info:state and info:sn are not reverted. Moreover, new active master rebuilds the region states in memory and use this info. So if all regions have consistent info:sn values (i.e. consistent with info:server and info:serverstartcode), nothing goes wrong and this is likely going to happen when we revert the config with rolling restart of masters. However, after this config revert, if any region moves, only info:server and info:serverstartcode get updated but info:sn and info:state values stay the same. Because of the missing condition, subsequent active master restart would try to rebuild regions and assign regions as per info:sn, but those regions are already OPEN on info:server, hence we get doubly assigned regions.
We need two part fix for this:
- Guard reading of info:sn and info:state with proper conditions.
- Once active master init is complete, if ZK-based region assignment is enabled and redundant CQs are available in meta (info:sn and info:state), delete them all.
Attachments
Issue Links
- links to