Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Cannot Reproduce
-
Impala 2.7.0
-
None
Description
We saw this hang in testing where the Impalad was unresponsive (no debug webpage, fragments not starting, etc). The .threads are stack traces from two different instances of the hangs on the same cluster. The lock order graphs are some analysis of the lock order, but I haven't found the actual deadlock from the current stacks (the one cycle is IMPALA-4409, which there wasn't any evidence of in the stacks).
I don't think any of these related issues apply:
IMPALA-4037involves ChildQuery::Cancel, which we don't see in the stacksIMPALA-4038involves a query holding a coordinator lock while doing an RPC, which we don't see in the stacksIMPALA-4409involves CancelInternal(), but would require a thread to be blocked in CancelInternal() itself after GetQueryExecState() returns. Instead we see all the stacks sitting in GetQueryExecState()
It looks like a classic deadlock but I haven't been able to figure out what the cycle of dependencies is.