[IMPALA-1726] Statestore should garbage collect hung connections - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: Impala 2.1
Fix Version/s: Impala 2.2
Component/s: None
Labels:
- ramp-up

Description

If a node is truly hung, the statestore may apparently wait forever to receive the heartbeat response. We need to check the TCP timeouts on the connections from the statestore to the subscriber.

Since the operating system can also interfere, we should periodically visit all heartbeat threads and see how long they've been in the heartbeat RPC for. I think we can forcibly close the socket in a GC thread if it's taken too long. The next time round should hit the TCP cnxn timeout (or be refused), and the subscriber should be marked as dead.

Attachments

Activity

People

Assignee:: Henry Robinson

Reporter:: Henry Robinson

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 02/Feb/15 18:41

Updated:: 04/Jan/17 23:58

Resolved:: 17/Mar/15 16:26