Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9124

Transparently retry queries that fail due to cluster membership changes

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: In Progress
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Backend, Clients
    • Labels:
      None
    • Epic Color:
      ghx-label-5

      Description

      Currently, if the Impala Coordinator or any Executors run into errors during query execution, Impala will fail the entire query. It would improve user experience to transparently retry the query for some transient, recoverable errors.

      This JIRA focuses on retrying queries that would otherwise fail due to cluster membership changes. Specifically, node failures that cause changes in the cluster membership (currently the Coordinator cancels all queries running on a node if it detects that the node is no longer part of the cluster) and node blacklisting (the Coordinator blacklists a node because it detects a problem with that node - can’t execute RPCs against the node). It is not focused on retrying general errors (e.g. any frontend errors, MemLimitExceeded exceptions, etc.).

        Attachments

        Issue Links

        1.
        Classify certain errors as retryable Sub-task Resolved Sahil Takiar Actions
        2.
        Add support for single query retries on cluster membership changes Sub-task Resolved Sahil Takiar Actions
        3.
        Avoid copying TExecRequest when retrying queries Sub-task Resolved Sahil Takiar Actions
        4.
        Client logs should indicate if a query has been retried Sub-task Resolved Quanlong Huang Actions
        5.
        Query progress bar freezes when a query is retried Sub-task Resolved Quanlong Huang Actions
        6.
        ACID-query retry integration Sub-task Resolved Sahil Takiar Actions
        7.
        Link failed and retried runtime profiles Sub-task Resolved Sahil Takiar Actions
        8.
        Retried queries that blacklist nodes should ensure they don't run on the blacklisted node Sub-task Resolved Wenzhe Zhou Actions
        9.
        Retryable queries should spool all results before returning any to the client Sub-task Resolved Quanlong Huang Actions
        10.
        TSAN data race in QueryDriver::CreateRetriedClientRequestState Sub-task Resolved Sahil Takiar Actions
        11.
        TSAN lock-order-inversion warning in QueryDriver::RetryQueryFromThread Sub-task Resolved Sahil Takiar Actions
        12.
        Hit DCHECK when retrying a query in FINISHED state Sub-task Resolved Quanlong Huang Actions
        13.
        summary and profile command in impala-shell should show both original and retried info Sub-task Resolved Quanlong Huang Actions
        14.
        Fix error reporting when AuxErrorInfoPB is present without an error Sub-task Open Wenzhe Zhou Actions
        15.
        Test coverage for query retries when there is a network partition Sub-task Open Wenzhe Zhou Actions
        16.
        Retried runtime profile should include some information about previous query attempts Sub-task Open Unassigned Actions
        17.
        Add impalad level metrics for query retries Sub-task Open Unassigned Actions
        18.
        Queries should only be retried if all fragments fail with retryable errors Sub-task Open Unassigned Actions
        19.
        Re-factor ImpalaServer, ClientRequestState, Coordinator protocol Sub-task Open Unassigned Actions
        20.
        Test that queries are not retried if they cause an impalad to crash Sub-task Open Unassigned Actions
        21.
        Web UI improvements for retried queries Sub-task Open Unassigned Actions
        22.
        Add support for multi query retries on cluster membership changes Sub-task Open Unassigned Actions
        23.
        Profile log does not include profiles of failed queries Sub-task Open Unassigned Actions
        24.
        Impala Doc: Add docs for transparent query retries Sub-task Open shajini thayasingh Actions
        25.
        Consider using num_rows_fetched instead of fetched_rows in checking whether client has fetched any results in TryQueryRetry Sub-task Open Unassigned Actions

          Activity

            People

            • Assignee:
              stakiar Sahil Takiar
              Reporter:
              stakiar Sahil Takiar

              Dates

              • Created:
                Updated:

                Issue deployment