Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • Impala 4.0.0
    • None
    • None
    • ghx-label-9

    Description

      Queries will go into FINISHED state when rows are available, no matter whether the client has fetched any results. If the client hasn't called fetch on the query, the query should still be retryable. However, retrying such a query hit a DCHECK at https://github.com/apache/impala/blob/a0057788c5c2300f58b6615a27116b8331171e06/be/src/runtime/query-driver.cc#L131-L135

      This can be reproduce by modifying test_retries_from_cancellation_pool in tests/customer_test/test_query_retry.py:

      diff --git a/tests/custom_cluster/test_query_retries.py b/tests/custom_cluster/test_query_retries.py
      index 54f2334..ae57068 100644
      --- a/tests/custom_cluster/test_query_retries.py
      +++ b/tests/custom_cluster/test_query_retries.py
      @@ -69,21 +69,23 @@ class TestQueryRetries(CustomClusterTestSuite):
           # The following query executes slowly, and does minimal TransmitData RPCs, so it is
           # likely that the statestore detects that the impalad has been killed before a
           # TransmitData RPC has occurred.
      -    query = "select count(*) from functional.alltypes where bool_col = sleep(50)"
      +    query = "select count(*) from functional.alltypestiny union all select count(*) from functional.alltypes where bool_col = sleep(50)"
       
           # Launch the query, wait for it to start running, and then kill an impalad.
           handle = self.execute_query_async(query,
               query_options={'retry_failed_queries': 'true'})
      -    self.wait_for_state(handle, self.client.QUERY_STATES['RUNNING'], 60)
      +    self.wait_for_state(handle, self.client.QUERY_STATES['FINISHED'], 60)
       
           # Kill a random impalad (but not the one executing the actual query).
           self.__kill_random_impalad()
      +    time.sleep(10)
       
           # Validate the query results.
           results = self.client.fetch(query, handle)
           assert results.success
      -    assert len(results.data) == 1
      -    assert int(results.data[0]) == 3650
      +    assert len(results.data) == 2
      +    assert int(results.data[0]) == 8
      +    assert int(results.data[1]) == 3650
       
           # Validate the live exec summary.
           retried_query_id = self.__get_retried_query_id_from_summary(handle)
      

      The change choose another query that has two UNION operands. The query will be in FINISHED state after the first operand finishes. When we kill an impalad, the coordinator hit the DCHECK.

      We should support retrying a FINISHED (but actually running) query that hasn't returned any results. This is required by IMPALA-9225.

      Attachments

        Activity

          People

            stigahuang Quanlong Huang
            stigahuang Quanlong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: