Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28405

Region open procedure silently returns without notifying the parent proc

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      We had a scenario in production where a merge operation had failed as below

      2024-02-11 10:53:57,715 ERROR [PEWorker-31] assignment.MergeTableRegionsProcedure - Error trying to merge [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)
      org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, location=rs-229,60020,1707587658182, table=table1, region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up
      at org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)
      at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)
      at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)
      at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)
      at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
      at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)
      at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)
      at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)
      at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)
      at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)
      at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)
      at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)

      Now when we do rollback of failed merge operation we see a issue where region is in state opened until the RS holding it stopped.

      Rollback create a TRSP as below

      2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - Stored [pid=26674602, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=table1, region=a92008b76ccae47d55c590930b837036, ASSIGN]

      and rollback finished successfully

      2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - Rolled back pid=26673594, state=ROLLEDBACK, exception=org.apache.hadoop.hbase.HBaseIOException via master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, location=rs-229,60020,1707587658182, table=table1, region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; MergeTableRegionsProcedure table=table1, regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], force=false exec-time=1.4820 sec

      We create a procedure to open the region a92008b76ccae47d55c590930b837036. Intrestingly we didnt close the region as creation of procedure to close regions had thrown exception and not execution of procedure. When we run TRSP it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This handlers on execution suggests that region is already online

      Sequence of events are as follow

      2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, regionState=OPENING, regionLocation=rs-210,60020,1707596461539

      _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - Initialized subprocedures=[

      {pid=26675798, ppid=26674602, state=RUNNABLE; OpenRegionProcedure a92008b76ccae47d55c590930b837036, server=rs-210,60020,1707596461539}

      ]_

      2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] handler.AssignRegionHandler - Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. which is already online

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mnpoonia Aman Poonia Assign to me
            mnpoonia Aman Poonia
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment