Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5198

Error messages are sometimes dropped before reaching client

    Details

    • Epic Color:
      ghx-label-6

      Description

      On the nightly core s3 tests, the custom cluster test test_exchange_delays failed. It looks like the query failed for an unexplained reason. It says 'aborted' but we need to look at it more closely.

      07:56:58 =================================== FAILURES ===================================
      07:56:58  TestExchangeDelays.test_exchange_small_delay[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      07:56:58 
      07:56:58 self = <test_exchange_delays.TestExchangeDelays object at 0x3f054d0>
      07:56:58 vector = <tests.common.test_vector.ImpalaTestVector object at 0x5a32750>
      07:56:58 
      07:56:58     @pytest.mark.execute_serially
      07:56:58     @CustomClusterTestSuite.with_args("--stress_datastream_recvr_delay_ms=10000"
      07:56:58           " --datastream_sender_timeout_ms=5000")
      07:56:58     def test_exchange_small_delay(self, vector):
      07:56:58       """Test delays in registering data stream receivers where the first one or two
      07:56:58         batches will time out before the receiver registers, but subsequent batches will
      07:56:58         arrive after the receiver registers. Before IMPALA-2987, this scenario resulted in
      07:56:58         incorrect results.
      07:56:58         """
      07:56:58 >     self.run_test_case('QueryTest/exchange-delays', vector)
      07:56:58 
      07:56:58 custom_cluster/test_exchange_delays.py:39: 
      07:56:58 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      07:56:58 common/impala_test_suite.py:362: in run_test_case
      07:56:58     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
      07:56:58 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      07:56:58 
      07:56:58 self = <test_exchange_delays.TestExchangeDelays object at 0x3f054d0>
      07:56:58 expected_strs = ['Sender timed out waiting for receiver fragment instance\n']
      07:56:58 actual_str = 'ImpalaBeeswaxException: Query aborted: (1 of 3 similar)'
      07:56:58 use_db = None
      07:56:58 
      07:56:58     def __verify_exceptions(self, expected_strs, actual_str, use_db):
      07:56:58       """
      07:56:58         Verifies that at least one of the strings in 'expected_str' is a substring of the
      07:56:58         actual exception string 'actual_str'.
      07:56:58         """
      07:56:58       actual_str = actual_str.replace('\n', '')
      07:56:58       for expected_str in expected_strs:
      07:56:58         # In error messages, some paths are always qualified and some are not.
      07:56:58         # So, allow both $NAMENODE and $FILESYSTEM_PREFIX to be used in CATCH.
      07:56:58         expected_str = expected_str.strip() \
      07:56:58             .replace('$FILESYSTEM_PREFIX', FILESYSTEM_PREFIX) \
      07:56:58             .replace('$NAMENODE', NAMENODE) \
      07:56:58             .replace('$IMPALA_HOME', IMPALA_HOME)
      07:56:58         if use_db: expected_str = expected_str.replace('$DATABASE', use_db)
      07:56:58         # Strip newlines so we can split error message into multiple lines
      07:56:58         expected_str = expected_str.replace('\n', '')
      07:56:58         if expected_str in actual_str: return
      07:56:58       assert False, 'Unexpected exception string. Expected: %s\nNot found in actual: %s' % \
      07:56:58 >       (expected_str, actual_str)
      07:56:58 E     AssertionError: Unexpected exception string. Expected: Sender timed out waiting for receiver fragment instance
      07:56:58 E     Not found in actual: ImpalaBeeswaxException: Query aborted: (1 of 3 similar)
      07:56:58 
      07:56:58 common/impala_test_suite.py:253: AssertionError
      
      07:56:58 ----------------------------- Captured stderr call -----------------------------
      07:56:58 -- executing against localhost:21000
      07:56:58 use functional;
      07:56:58 
      07:56:58 SET disable_codegen=False;
      07:56:58 SET abort_on_error=1;
      07:56:58 SET exec_single_node_rows_threshold=0;
      07:56:58 SET batch_size=0;
      07:56:58 SET num_nodes=0;
      07:56:58 -- executing against localhost:21000
      07:56:58 select count(*)
      07:56:58 from tpch.lineitem
      07:56:58   inner join tpch.orders on l_orderkey = o_orderkey;
      07:56:58 
      07:56:58 ======== 1 failed, 43 passed, 20 skipped, 8 xfailed in 3213.63 seconds =========
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sailesh Sailesh Mukil
                Reporter:
                mjacobs Matthew Jacobs
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: