Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.3.0
Description
The coordinator currently waits indefinitely if it does not hear back from a backend. This could cause a query to hang indefinitely in case of a network error, etc.
We should add logic for determining when a backend is unresponsive and kill the query. The logic should mostly revolve around Coordinator::Wait() and Coordinator::UpdateFragmentExecStatus() based on whether it receives periodic updates from a backed (via FragmentExecState::ReportStatusCb()).
Attachments
Issue Links
- blocks
-
IMPALA-6787 On large secure clusters the connection setup thread becomes bottleneck at warmup and cause occasional timeout failures
- Resolved
-
IMPALA-6338 Tests fail due to runtime profile for query with limit missing pieces
- Resolved
- incorporates
-
IMPALA-4555 Don't cancel query for failed ReportExecStatus (done=false) RPC
- Resolved
- is depended upon by
-
IMPALA-5119 Don't make RPCs from Coordinator::UpdateBackendExecStatus()
- Open
- is duplicated by
-
IMPALA-414 Impala server cannot detect crash-restart failures reliably
- Resolved
- is related to
-
IMPALA-4063 Make fragment instance reports per-query (or per-host) instead of per-fragment instance.
- Resolved
-
IMPALA-6596 Query failed with OOM on coordinator while remote fragments on other nodes continue to run
- Open
-
IMPALA-9919 Bad Impala Performance after a period of time
- Open
-
IMPALA-2567 KRPC milestone 1
- Resolved
-
IMPALA-5576 Wrong Cancel() in QueryState::ReportExecStatusAux() can lead to coordinator hang
- Resolved
-
IMPALA-5746 Remote fragments continue to hold onto memory after stopping the coordinator daemon
- Resolved
-
IMPALA-8327 TestRPCTimeout::test_reportexecstatus_retry() times out on exhaustive build
- Resolved
-
IMPALA-3160 Queries may not get cancelled if cancellation pool hits MAX_CANCELLATION_QUEUE_SIZE
- Resolved
-
IMPALA-539 Impala should gather final runtime profile from fragments for aborted/cancelled query
- Resolved
- Parent Feature
-
IMPALA-3380 Add TCP timeouts to all RPCs that don't block
- Resolved
- relates to
-
IMPALA-6984 Coordinator should cancel backends when returning EOS
- Reopened
- requires
-
IMPALA-7163 Implement a state machine for the QueryState class
- Resolved
1.
|
Port ReportExecStatus() RPCs to KRPC | Resolved | Michael Ho | |
2.
|
Deprecate --use_krpc flag | Resolved | Michael Ho | |
3.
|
Implement a state machine for the QueryState class | Resolved | Sailesh Mukil | |
4.
|
Make fragment instance reports per-query (or per-host) instead of per-fragment instance. | Resolved | Michael Ho | |
5.
|
Use sidecars for Thrift-wrapped RPC payloads | Resolved | Unassigned | |
6.
|
Don't cancel query for failed ReportExecStatus (done=false) RPC | Resolved | Thomas Tauber-Marshall |