Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-22657

HBase : STUCK Region-In-Transition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.0.0
    • None
    • None
    • None
    • Important

    Description

      When we check the number of regions in transition on Ambari, It shows 1 transition is waiting. (It's more than 1 in other cluster)

      And also, when check the table with command "hbase hbck -details table_name" status looks INCONSISTENT

      _There are 0 overlap groups with 0 overlapping regions
      ERROR: Found inconsistency in table Table_Name
      Summary:
      Table hbase:meta is okay.
      Number of regions: 1
      Deployed on: hostname1:port, hostname2:port, hostname3:port, hostname4:port
      Table *Table_Name *is okay.
      Number of regions: 39
      Deployed on: hostname1:port, hostname2:port, hostname3:port, hostname4:port
      2 inconsistencies detected.
      Status: INCONSISTENT

      When I check the logfiles, I saw following warning messages,

      2019-06-09T07:14:15.179+02:00 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition rit=CLOSING, location=hostname,port,1558699727048, table=table_name, region=c67dd5d8bcd174cc2001695c31475ab1

      According this message, region c67dd5d8bcd174cc2001695c31475ab1 try to assign host but this operation is stuck.

      We stopped RS process on host and force assign to another RS which are running.

      hbase(main):001:0> assign 'c67dd5d8bcd174cc2001695c31475ab1'

      After that operaion, INCONSISTENT has gone and we re-started RS on host.
      One of the reasons why a region gets stuck in transition is because, when it is being moved across regionservers, it is unassigned from the source regionserver but is never assigned to another regionserver

      I think Below code is responsible for that process.

      private void handleRegionOverStuckWarningThreshold(final RegionInfo regionInfo) {
      final RegionStateNode regionNode = regionStates.getRegionStateNode(regionInfo);
      //if (regionNode.isStuck()) {
      LOG.warn("STUCK Region-In-Transition {}", regionNode);_

      It seems one potential way of unstuck the region is to send close request to the region server. May be blocked because another Procedure holds the exclusive lock and is not letting go.

      My question is what is the root cause for this problem and I think, HBase should be able to fix region-In-Transition issue.
      We can fix this problem by manual but some customer does not have this knowledge and I think HBase needs to be recover itself.

      Attachments

        Activity

          People

            Unassigned Unassigned
            oktaytncy oktay tuncay
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: