Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7931

test_shutdown_executor fails with timeout waiting for query target state

    XMLWordPrintableJSON

Details

    Description

      On a recent S3 test run test_shutdown_executor hit a timeout waiting for a query to reach state FINISHED. Instead the query stays at state 5 (EXCEPTION).

      12:51:11 __________________ TestShutdownCommand.test_shutdown_executor __________________
      12:51:11 custom_cluster/test_restart_services.py:209: in test_shutdown_executor
      12:51:11     assert self.__fetch_and_get_num_backends(QUERY, before_shutdown_handle) == 3
      12:51:11 custom_cluster/test_restart_services.py:356: in __fetch_and_get_num_backends
      12:51:11     self.client.QUERY_STATES['FINISHED'], timeout=20)
      12:51:11 common/impala_service.py:267: in wait_for_query_state
      12:51:11     target_state, query_state)
      12:51:11 E   AssertionError: Did not reach query state in time target=4 actual=5
      

      From the logs I can see that the query fails because one of the executors becomes unreachable:

      I1204 12:31:39.954125  5609 impala-server.cc:1792] Query a34c3a84775e5599:b2b25eb900000000: Failed due to unreachable impalad(s): jenkins-worker:22001
      

      The query was select count(*) from functional_parquet.alltypes where sleep(1) = bool_col.

      It seems that the query took longer than expected and was still running when the executor shut down.

      I can reproduce by adding a sleep to the test:

      diff --git a/tests/custom_cluster/test_restart_services.py b/tests/custom_cluster/test_restart_services.py
      index e441cbc..32bc8a1 100644
      --- a/tests/custom_cluster/test_restart_services.py
      +++ b/tests/custom_cluster/test_restart_services.py
      @@ -206,7 +206,7 @@ class TestShutdownCommand(CustomClusterTestSuite, HS2TestSuite):
           after_shutdown_handle = self.__exec_and_wait_until_running(QUERY)
       
           # Finish executing the first query before the backend exits.
      -    assert self.__fetch_and_get_num_backends(QUERY, before_shutdown_handle) == 3
      +    assert self.__fetch_and_get_num_backends(QUERY, before_shutdown_handle, delay=5) == 3
       
           # Wait for the impalad to exit, then start it back up and run another query, which
           # should be scheduled on it again.
      @@ -349,11 +349,14 @@ class TestShutdownCommand(CustomClusterTestSuite, HS2TestSuite):
                       self.client.QUERY_STATES['RUNNING'], timeout=20)
           return handle
       
      -  def __fetch_and_get_num_backends(self, query, handle):
      +  def __fetch_and_get_num_backends(self, query, handle, delay=0):
           """Fetch the results of 'query' from the beeswax handle 'handle', close the
           query and return the number of backends obtained from the profile."""
           self.impalad_test_service.wait_for_query_state(self.client, handle,
                       self.client.QUERY_STATES['FINISHED'], timeout=20)
      +    if delay > 0:
      +      LOG.info("sleeping for {0}".format(delay))
      +      time.sleep(delay)
           self.client.fetch(query, handle)
           profile = self.client.get_runtime_profile(handle)
           self.client.close_query(handle)
      

      Attachments

        1. impala-7931-impalad-logs.tar.gz
          105 kB
          Tim Armstrong

        Activity

          People

            tarmstrong Tim Armstrong
            lv Lars Volker
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: