Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9124 Transparently retry queries that fail due to cluster membership changes
  3. IMPALA-9254

Queries should only be retried if all fragments fail with retryable errors

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Distributed Exec
    • Labels:
      None
    • Epic Color:
      ghx-label-4

      Description

      Currently, Impala only propagates an overall_status from an executor to the coordinator. The overall_status is set in the QueryState and "If multiple fragments have errors, the first fragment to hit an error is given preference.".

      The issue is that if multiple fragments fail, it is possible some of the errors should trigger a retry, while other errors shouldn't. For example, one fragment could fail due to faulty disks, but others could fail due to mem limit exceptions. These types of queries shouldn't be retried because it is likely the query will just fail again.

      This can only happen if the non-retryable error occurs in a specific time window: [when the retryable error occurs, the query is cancelled]. Since any fragment failure causes the entire query to be cancelled, this can only occur if the non-retryable error occurs after the retryable error, but before the query is cancelled.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              stakiar Sahil Takiar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: