Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.10.0
-
None
-
ghx-label-2
Description
Discovered while running 'test_finst_cancel_when_query_complete' in a loop trying to repro a different issue, there's a race in Coordinator::UpdateBackendExecStatus that causes Impala to crash on the 'DCHECK_GT(num_remaining_backends_, 0)'
The problem is that only the first exec report returned for a particular backend after it has completed is supposed to hit line 992, where we decrease 'num_remaining_backends_'. Per the comments, this is supposed to be ensured by the BackendState::IsDone check on line 945.
However, the check and the update aren't performed atomically, so you can have a situation where two threads enter UpdateBackendExecStatus at the same time, both check BackendState::IsDone and find it false, and then both proceed to update num_remaining_backends_, with the second one hitting the DCHECK.