Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1726

Statestore should garbage collect hung connections

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.1
    • Impala 2.2
    • None

    Description

      If a node is truly hung, the statestore may apparently wait forever to receive the heartbeat response. We need to check the TCP timeouts on the connections from the statestore to the subscriber.

      Since the operating system can also interfere, we should periodically visit all heartbeat threads and see how long they've been in the heartbeat RPC for. I think we can forcibly close the socket in a GC thread if it's taken too long. The next time round should hit the TCP cnxn timeout (or be refused), and the subscriber should be marked as dead.

      Attachments

        Activity

          People

            henryr Henry Robinson
            henryr Henry Robinson
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: