As reported on the mailing list, this is a regression due to
IMPALA-7096 (7ccf7369085aa49a8fc0daf6f91d97b8a3135682). The scanner thread has the following code:
What if we have the following scenario:
T1) grab scan range 1 and start processing
T2) grab scan range 2 and start processing
T1) finish scan range 1 and see that 'progress_' is not done()
T1) loop around, get no scan range (there are no more), so set all_ranges_satrted_ and break
T1) thread exits
T2) finish scan range 2
T2) happen to hit a soft memory limit error due to pressure from other exec nodes, etc. Since we aren't the first thread, we break. (even though the first thread is no longer running)
T2) thread exits
Note that no one got to the point of calling SetDone() because we break due to the memory limit error before checking progress_.Done().
Thus, the query will hang forever.