Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20152 [AMv2] DisableTableProcedure versus ServerCrashProcedure
  3. HBASE-20173

[AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.0.0
    • amv2
    • None

    Description

      See 'Deadlock' scenario in parent issue. Doing as focused subtask since parent has a few things going on in it.

      Let me reproduce it below:

      From HBASE-20137, 'TestRSGroups is Flakey', https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325

      • SCP is running because a server was aborted in test.
      • SCP starts AssignProcedure of region X from crashed server.
      • DisableTable Procedure runs because test has finished and we're doing table delete. Queues
      • UnassignProcedure for region X.
      • Disable Unassign gets Lock on region X first.
      • SCP AssignProcedure tries to get lock, waits on lock.
      • DisableTable Procedure UnassignProcedure RPC fails because server is down (Thats why the SCP).
      • Tries to expire the server it failed the RPC against. Fails (currently being SCP'd).
      • DisableTable Procedure Unassign is suspended. It is a suspend with lock on region X held
      • SCP can't run because lock on X is held
      • Test timesout.

      Attachments

        1. HBASE-20173.branch-2.001.patch
          13 kB
          Michael Stack
        2. HBASE-20173.branch-2.002.patch
          17 kB
          Michael Stack

        Issue Links

          Activity

            People

              stack Michael Stack
              stack Michael Stack
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: