Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9124

Transparently retry queries that fail due to cluster membership changes

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: In Progress
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Backend, Clients
    • Labels:
      None
    • Epic Color:
      ghx-label-5

      Description

      Currently, if the Impala Coordinator or any Executors run into errors during query execution, Impala will fail the entire query. It would improve user experience to transparently retry the query for some transient, recoverable errors.

      This JIRA focuses on retrying queries that would otherwise fail due to cluster membership changes. Specifically, node failures that cause changes in the cluster membership (currently the Coordinator cancels all queries running on a node if it detects that the node is no longer part of the cluster) and node blacklisting (the Coordinator blacklists a node because it detects a problem with that node - can’t execute RPCs against the node). It is not focused on retrying general errors (e.g. any frontend errors, MemLimitExceeded exceptions, etc.).

        Attachments

          Issue Links

          1.
          Classify certain errors as retryable Sub-task Resolved Sahil Takiar
          2.
          Add support for single query retries on cluster membership changes Sub-task Resolved Sahil Takiar
          3.
          Avoid copying TExecRequest when retrying queries Sub-task Resolved Sahil Takiar
          4.
          Client logs should indicate if a query has been retried Sub-task Resolved Quanlong Huang
          5.
          Query progress bar freezes when a query is retried Sub-task Resolved Quanlong Huang
          6.
          ACID-query retry integration Sub-task Resolved Sahil Takiar
          7.
          Link failed and retried runtime profiles Sub-task Open Sahil Takiar
          8.
          Retried queries that blacklist nodes should ensure they don't run on the blacklisted node Sub-task Open Wenzhe Zhou
          9.
          Retryable queries should spool all results before returning any to the client Sub-task In Progress Quanlong Huang
          10.
          Retried runtime profile should include some information about previous query attempts Sub-task Open Unassigned
          11.
          Fix error reporting when AuxErrorInfoPB is present without an error Sub-task Open Unassigned
          12.
          Test coverage for query retries when there is a network partition Sub-task Open Unassigned
          13.
          Add impalad level metrics for query retries Sub-task Open Unassigned
          14.
          Queries should only be retried if all fragments fail with retryable errors Sub-task Open Unassigned
          15.
          Re-factor ImpalaServer, ClientRequestState, Coordinator protocol Sub-task Open Sahil Takiar
          16.
          Test that queries are not retried if they cause an impalad to crash Sub-task Open Unassigned
          17.
          Web UI improvements for retried queries Sub-task Open Unassigned
          18.
          Add support for multi query retries on cluster membership changes Sub-task Open Unassigned
          19.
          TSAN data race in QueryDriver::CreateRetriedClientRequestState Sub-task Resolved Sahil Takiar
          20.
          TSAN lock-order-inversion warning in QueryDriver::RetryQueryFromThread Sub-task Resolved Sahil Takiar
          21.
          summary and profile command in impala-shell should show both original and retried info Sub-task Open Unassigned
          22.
          Profile log does not include profiles of failed queries Sub-task Open Unassigned
          23.
          Impala Doc: Add docs for transparent query retries Sub-task Open Unassigned

            Activity

              People

              • Assignee:
                stakiar Sahil Takiar
                Reporter:
                stakiar Sahil Takiar
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: