[IMPALA-9113] Queries can hang if an impalad is killed after a query has FINISHED - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: None
Fix Version/s: None
Component/s: Backend, Clients
Labels:
None

Epic Color:
ghx-label-9

Description

There is a race condition in the query coordination code that could cause queries to hang indefinitely in an un-cancellable state if an impalad crashes after the query has transitioned to the FINISHED state, but before all backends have completed.

The issue occurs if:

A query produces all results
A client issues a fetch request to read all of those results
The client fetch request fetches all available rows (e.g. eos is hit)
Coordinator::GetNext then calls SetNonErrorTerminalState(ExecState::RETURNED_RESULTS) which eventually calls WaitForBackends()
WaitForBackends() will block until all backends have completed
One of the impalads running the query crashes, and thus never reports success for the query fragment it was running
The WaitForBackends() call will then block indefinitely
Any attempt to cancel the query fails because the original fetch request that drove the WaitForBackends() call has acquired the ClientRequestState lock, which thus prevents any cancellation from occurring.

Implementing IMPALA-6984 should theoretically fix this because as soon as eos is hit, the coordinator will call CancelBackends() rather than WaitForBackends(). Another solution would be to add a timeout to the WaitForBackends() so that it returns after the timeout is hit, this would force the fetch request to return 0 rows with hasMoreRows=true, and unblock any cancellation threads.

Attachments

Issue Links

is related to

IMPALA-6984 Coordinator should cancel backends when returning EOS

Reopened

IMPALA-7312 Non-blocking mode for Fetch() RPC

Resolved

relates to

IMPALA-9124 Transparently retry queries that fail due to cluster membership changes

In Progress

Activity

People

Assignee:: Sahil Takiar

Reporter:: Sahil Takiar

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 31/Oct/19 20:51

Updated:: 16/Nov/19 00:09

Resolved:: 16/Nov/19 00:09