Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
Impala 2.5.0
-
None
-
None
-
Enterprise CDH 5.7.0, Parcels
impalad version 2.5.0-cdh5.7.0 RELEASE (build ad3f5adabedf56fe6bd9eea39147c067cc552703)
Description
Occasionally we experience heavy load on a single impalad host. This leads to the entire cluster to hang and prevents any impala queries from being able to execute.
Here's what we observe:
-load increases on a single impalad
-query throughput across the entire impala cluster drops and we cannot get any queries to execute
-running threads continues to increase until we restart the impala service
-in the impalad logs we see errors connecting to the unhealthy host. Example: Couldn't open transport for ux-reporting-engine-worker-23-prod-us-east-1a:22000 (connect() failed: Connection timed out)
Questions:
Why does the entire Impala service become unstable due to the health of a single impalad?
Theoretically, shouldn't the impala statestore prevent the single impalad host from being used and allow queries to be processed by healthy nodes?
Attachments
Attachments
Issue Links
- relates to
-
IMPALA-5865 Improve Impala execution scalability
- Open
-
IMPALA-2567 KRPC milestone 1
- Resolved
-
IMPALA-9299 Node Blacklisting: Coordinators should blacklist unhealthy nodes
- Open