Currently, if the Impala Coordinator or any Executors run into errors during query execution, Impala will fail the entire query. It would improve user experience to transparently retry the query for some transient, recoverable errors.
This JIRA focuses on retrying queries that would otherwise fail due to cluster membership changes. Specifically, node failures that cause changes in the cluster membership (currently the Coordinator cancels all queries running on a node if it detects that the node is no longer part of the cluster) and node blacklisting (the Coordinator blacklists a node because it detects a problem with that node - can’t execute RPCs against the node). It is not focused on retrying general errors (e.g. any frontend errors, MemLimitExceeded exceptions, etc.).