Description
See 'Deadlock' scenario in parent issue. Doing as focused subtask since parent has a few things going on in it.
Let me reproduce it below:
From HBASE-20137, 'TestRSGroups is Flakey', https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325
- SCP is running because a server was aborted in test.
- SCP starts AssignProcedure of region X from crashed server.
- DisableTable Procedure runs because test has finished and we're doing table delete. Queues
- UnassignProcedure for region X.
- Disable Unassign gets Lock on region X first.
- SCP AssignProcedure tries to get lock, waits on lock.
- DisableTable Procedure UnassignProcedure RPC fails because server is down (Thats why the SCP).
- Tries to expire the server it failed the RPC against. Fails (currently being SCP'd).
- DisableTable Procedure Unassign is suspended. It is a suspend with lock on region X held
- SCP can't run because lock on X is held
- Test timesout.
Attachments
Attachments
Issue Links
- is related to
-
HBASE-20634 Reopen region while server crash can cause the procedure to be stuck
- Resolved
- links to