Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1726

Statestore should garbage collect hung connections

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.1
    • Fix Version/s: Impala 2.2
    • Component/s: None
    • Labels:

      Description

      If a node is truly hung, the statestore may apparently wait forever to receive the heartbeat response. We need to check the TCP timeouts on the connections from the statestore to the subscriber.

      Since the operating system can also interfere, we should periodically visit all heartbeat threads and see how long they've been in the heartbeat RPC for. I think we can forcibly close the socket in a GC thread if it's taken too long. The next time round should hit the TCP cnxn timeout (or be refused), and the subscriber should be marked as dead.

        Attachments

          Activity

            People

            • Assignee:
              henryr Henry Robinson
              Reporter:
              henryr Henry Robinson
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: