Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21222

[amv2] Closing region on a non-existent server creates STUCK regions

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: amv2
    • Labels:
      None

      Description

      Ran into this one where a Region had been on a server but after a bunch of crashing and meddling in Master Proc WALs, any attempt at unassign has the procedure fail (see below) and then report the region as STUCK.

      I broke the lock w/ new hbck2 tooling and then tried to offline again but same thing happened. Bug. Fix.

      2018-09-22 18:36:41,900 INFO org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Dispatch pid=138650, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, region=51cdade76ca7217ec191f39e5f56c61c, server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, location=vd0637.halxg.cloudera.com,22101,1537397969558
      2018-09-22 18:36:41,899 INFO org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: pid=138646, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, region=0780467efe4c5901887fb12bfa406fa7, server=vc1228.halxg.cloudera.com,22101,1537578279837 checking lock on 0780467efe4c5901887fb12bfa406fa7
      2018-09-22 18:36:41,900 WARN org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Remote call failed vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, region=51cdade76ca7217ec191f39e5f56c61c, server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, location=vd0637.halxg.cloudera.com,22101,1537397969558; exception=NoServerDispatchException
      org.apache.hadoop.hbase.procedure2.NoServerDispatchException: vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, region=51cdade76ca7217ec191f39e5f56c61c, server=vd0637.halxg.cloudera.com,22101,1537397969558
              at org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177)
              at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:277)
              at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:202)
              at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:370)
              at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
              at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:924)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1684)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1471)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:77)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1983)
      2018-09-22 18:36:41,903 WARN org.apache.hadoop.hbase.master.assignment.UnassignProcedure: Expiring vd0637.halxg.cloudera.com,22101,1537397969558, pid=138650, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, region=51cdade76ca7217ec191f39e5f56c61c, server=vd0637.halxg.cloudera.com,22101,1537397969558 rit=CLOSING, location=vd0637.halxg.cloudera.com,22101,1537397969558; exception=NoServerDispatchException
      
      

        Attachments

          Activity

            People

            • Assignee:
              stack Michael Stack
              Reporter:
              stack Michael Stack
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: