Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
2.0.0
-
None
-
None
-
None
-
Important
Description
When we check the number of regions in transition on Ambari, It shows 1 transition is waiting. (It's more than 1 in other cluster)
And also, when check the table with command "hbase hbck -details table_name" status looks INCONSISTENT
_There are 0 overlap groups with 0 overlapping regions
ERROR: Found inconsistency in table Table_Name
Summary:
Table hbase:meta is okay.
Number of regions: 1
Deployed on: hostname1:port, hostname2:port, hostname3:port, hostname4:port
Table *Table_Name *is okay.
Number of regions: 39
Deployed on: hostname1:port, hostname2:port, hostname3:port, hostname4:port
2 inconsistencies detected.
Status: INCONSISTENT
When I check the logfiles, I saw following warning messages,
2019-06-09T07:14:15.179+02:00 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition rit=CLOSING, location=hostname,port,1558699727048, table=table_name, region=c67dd5d8bcd174cc2001695c31475ab1
According this message, region c67dd5d8bcd174cc2001695c31475ab1 try to assign host but this operation is stuck.
We stopped RS process on host and force assign to another RS which are running.
hbase(main):001:0> assign 'c67dd5d8bcd174cc2001695c31475ab1'
After that operaion, INCONSISTENT has gone and we re-started RS on host.
One of the reasons why a region gets stuck in transition is because, when it is being moved across regionservers, it is unassigned from the source regionserver but is never assigned to another regionserver
I think Below code is responsible for that process.
private void handleRegionOverStuckWarningThreshold(final RegionInfo regionInfo) {
final RegionStateNode regionNode = regionStates.getRegionStateNode(regionInfo);
//if (regionNode.isStuck()) {
LOG.warn("STUCK Region-In-Transition {}", regionNode);_
It seems one potential way of unstuck the region is to send close request to the region server. May be blocked because another Procedure holds the exclusive lock and is not letting go.
My question is what is the root cause for this problem and I think, HBase should be able to fix region-In-Transition issue.
We can fix this problem by manual but some customer does not have this knowledge and I think HBase needs to be recover itself.