Details
-
Sub-task
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-9
Description
Queries will go into FINISHED state when rows are available, no matter whether the client has fetched any results. If the client hasn't called fetch on the query, the query should still be retryable. However, retrying such a query hit a DCHECK at https://github.com/apache/impala/blob/a0057788c5c2300f58b6615a27116b8331171e06/be/src/runtime/query-driver.cc#L131-L135
This can be reproduce by modifying test_retries_from_cancellation_pool in tests/customer_test/test_query_retry.py:
diff --git a/tests/custom_cluster/test_query_retries.py b/tests/custom_cluster/test_query_retries.py index 54f2334..ae57068 100644 --- a/tests/custom_cluster/test_query_retries.py +++ b/tests/custom_cluster/test_query_retries.py @@ -69,21 +69,23 @@ class TestQueryRetries(CustomClusterTestSuite): # The following query executes slowly, and does minimal TransmitData RPCs, so it is # likely that the statestore detects that the impalad has been killed before a # TransmitData RPC has occurred. - query = "select count(*) from functional.alltypes where bool_col = sleep(50)" + query = "select count(*) from functional.alltypestiny union all select count(*) from functional.alltypes where bool_col = sleep(50)" # Launch the query, wait for it to start running, and then kill an impalad. handle = self.execute_query_async(query, query_options={'retry_failed_queries': 'true'}) - self.wait_for_state(handle, self.client.QUERY_STATES['RUNNING'], 60) + self.wait_for_state(handle, self.client.QUERY_STATES['FINISHED'], 60) # Kill a random impalad (but not the one executing the actual query). self.__kill_random_impalad() + time.sleep(10) # Validate the query results. results = self.client.fetch(query, handle) assert results.success - assert len(results.data) == 1 - assert int(results.data[0]) == 3650 + assert len(results.data) == 2 + assert int(results.data[0]) == 8 + assert int(results.data[1]) == 3650 # Validate the live exec summary. retried_query_id = self.__get_retried_query_id_from_summary(handle)
The change choose another query that has two UNION operands. The query will be in FINISHED state after the first operand finishes. When we kill an impalad, the coordinator hit the DCHECK.
We should support retrying a FINISHED (but actually running) query that hasn't returned any results. This is required by IMPALA-9225.