Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9124

Transparently retry queries that fail due to cluster membership changes

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: In Progress
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • Backend, Clients
    • None
    • ghx-label-5

    Description

      Currently, if the Impala Coordinator or any Executors run into errors during query execution, Impala will fail the entire query. It would improve user experience to transparently retry the query for some transient, recoverable errors.

      This JIRA focuses on retrying queries that would otherwise fail due to cluster membership changes. Specifically, node failures that cause changes in the cluster membership (currently the Coordinator cancels all queries running on a node if it detects that the node is no longer part of the cluster) and node blacklisting (the Coordinator blacklists a node because it detects a problem with that node - can’t execute RPCs against the node). It is not focused on retrying general errors (e.g. any frontend errors, MemLimitExceeded exceptions, etc.).

      Attachments

        Issue Links

          1.
          Classify certain errors as retryable Sub-task Resolved Sahil Takiar
          2.
          Add support for single query retries on cluster membership changes Sub-task Resolved Sahil Takiar
          3.
          Avoid copying TExecRequest when retrying queries Sub-task Resolved Sahil Takiar
          4.
          Client logs should indicate if a query has been retried Sub-task Resolved Quanlong Huang
          5.
          Query progress bar freezes when a query is retried Sub-task Resolved Quanlong Huang
          6.
          ACID-query retry integration Sub-task Resolved Sahil Takiar
          7.
          Link failed and retried runtime profiles Sub-task Resolved Sahil Takiar
          8.
          Retried queries that blacklist nodes should ensure they don't run on the blacklisted node Sub-task Resolved Wenzhe Zhou
          9.
          Retryable queries should spool all results before returning any to the client Sub-task Resolved Quanlong Huang
          10.
          TSAN data race in QueryDriver::CreateRetriedClientRequestState Sub-task Resolved Sahil Takiar
          11.
          TSAN lock-order-inversion warning in QueryDriver::RetryQueryFromThread Sub-task Resolved Sahil Takiar
          12.
          Hit DCHECK when retrying a query in FINISHED state Sub-task Resolved Quanlong Huang
          13.
          summary and profile command in impala-shell should show both original and retried info Sub-task Resolved Quanlong Huang
          14.
          Fix error reporting when AuxErrorInfoPB is present without an error Sub-task Open Wenzhe Zhou
          15.
          Test coverage for query retries when there is a network partition Sub-task Open Wenzhe Zhou
          16.
          Retried runtime profile should include some information about previous query attempts Sub-task Open Unassigned
          17.
          Add impalad level metrics for query retries Sub-task Open Unassigned
          18.
          Queries should only be retried if all fragments fail with retryable errors Sub-task Open Unassigned
          19.
          Re-factor ImpalaServer, ClientRequestState, Coordinator protocol Sub-task Open Unassigned
          20.
          Test that queries are not retried if they cause an impalad to crash Sub-task Open Unassigned
          21.
          Web UI improvements for retried queries Sub-task Open Unassigned
          22.
          Add support for multi query retries on cluster membership changes Sub-task Open Unassigned
          23.
          Profile log does not include profiles of failed queries Sub-task Open Unassigned
          24.
          Impala Doc: Add docs for transparent query retries Sub-task Open shajini thayasingh
          25.
          Consider using num_rows_fetched instead of fetched_rows in checking whether client has fetched any results in TryQueryRetry Sub-task Open Unassigned

          Activity

            People

              stakiar Sahil Takiar
              stakiar Sahil Takiar
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: