Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-414

Impala server cannot detect crash-restart failures reliably

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: Impala 1.0.1
    • Fix Version/s: None
    • Component/s: Distributed Exec
    • Labels:

      Description

      The membership mechanism used to tell Impala servers about failures does not always detect fast crash-restarts. If a server restarts and re-registers before the state-store recognises that it has failed, the failure won't get reported to any other subscriber.

      The right way to fix this, I think, is to track a version number in every subscriber. When a subscriber reconnects, it gets a new version number. For every query, we track the highest version number of the subscriber known at that time. Then if any backend executing a query has a higher version number, it's likely to have restarted since the query started. There might be a couple of false positives, since a node could conceivably restart between a scheduling assignment and actually receiving a query, but that's unlikely and better than false negatives.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                henryr Henry Robinson
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: