Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5724

test_union hangs in exhaustive test run

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: Impala 2.10.0
    • Fix Version/s: None
    • Component/s: Distributed Exec
    • Labels:
    • Environment:
      rhel7

      Description

      On a recent exhaustive jenkins run (on rhel7), TestStatestore timed out:

      08:41:31 [gw2] PASSED unittests/test_file_parser.py::TestTestFileParser::test_parse_commented_out_test_as_comment 
      08:41:31 unittests/test_result_verifier.py::TestResultVerifier::test_result_row_indexing[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
      08:41:31 [gw2] PASSED unittests/test_result_verifier.py::TestResultVerifier::test_result_row_indexing[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
      08:41:31 [gw3] PASSED statestore/test_statestore.py::TestStatestore::test_update_is_delta 
      08:41:45 [gw0] PASSED statestore/test_statestore.py::TestStatestore::test_failure_detected Build timed out (after 1,440 minutes). Marking the build as failed.
      20:56:59 Build was aborted
      20:56:59 Archiving artifacts
      20:56:59 
      20:56:59 [gw0] node down: Not properly terminated
      20:56:59 [gw0] FAILED statestore/test_statestore.py::TestStatestore::test_topic_persistence 
      20:56:59 Replacing crashed slave gw0
      20:57:05 Recording test results
      20:57:08 Email was triggered for: Failure
      20:57:08 Sending email for trigger: Failure
      20:57:08 Sending email to: impala-jenkins@cloudera.com
      20:57:08 
      20:57:08 Deleting project workspace... 
      20:57:08 done
      20:57:08 
      20:57:08 Finished: FAILURE
      

      The statestore logs show a lot of errors like

      I0726 08:24:01.102785 30978 statestore.cc:526] Preparing initial test_skipped_b1501e92-7215-11e7-a5fa-02581563417c topic update for python-test-client-b1507018-7215-11e7-a5fa-02581563417c. Size = 8.00 B
      I0726 08:24:01.103085 30978 thrift-util.cc:123] TSocket::open() connect() <Host: localhost Port: 45518>Connection refused
      I0726 08:24:01.415092 30978 status.cc:55] RPC Error: Client for localhost:45518 hits an unexpected exception: TProtocolException: Invalid data, type: N6apache6thrift8protocol18TProtocolExceptionE rpc send completed: true
          @          0x12590d6  impala::Status::Status()
          @          0x15ee502  impala::ClientConnection<>::DoRpc<>()
          @          0x15e7431  impala::Statestore::SendTopicUpdate()
          @          0x15e9610  impala::Statestore::DoSubscriberUpdate()
          @          0x15fecfe  boost::_mfi::mf3<>::operator()()
          @          0x15fd5a5  boost::_bi::list4<>::operator()<>()
          @          0x15fb3de  boost::_bi::bind_t<>::operator()<>()
          @          0x15f88d3  boost::detail::function::void_function_obj_invoker2<>::invoke()
          @          0x15f4dfd  boost::function2<>::operator()()
          @          0x15f06ef  impala::ThreadPool<>::WorkerThread()
          @          0x160038d  boost::_mfi::mf1<>::operator()()
          @          0x15ffe17  boost::_bi::list2<>::operator()<>()
          @          0x15fedfd  boost::_bi::bind_t<>::operator()()
          @          0x15fd88c  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @          0x13d6148  boost::function0<>::operator()()
          @          0x16a7031  impala::Thread::SuperviseThread()
          @          0x16afb38  boost::_bi::list4<>::operator()<>()
          @          0x16afa7b  boost::_bi::bind_t<>::operator()()
          @          0x16afa3e  boost::detail::thread_data<>::run()
          @          0x1ba055a  thread_proxy
          @     0x7f23cdfa9df3  start_thread
          @     0x7f23cdcd71ad  __clone
      I0726 08:24:01.415179 30978 client-cache.cc:170] Broken Connection, destroy client for localhost:45518
      I0726 08:24:01.415273 30978 statestore.cc:697] Unable to send topic update message to subscriber python-test-client-b1507018-7215-11e7-a5fa-02581563417c, received error: RPC Error: Client for localhost:45518 hits an unexpected exception: TProtocolException: Invalid data, type: N6apache6thrift8protocol18TProtocolExceptionE rpc send completed: true
      

      I've attached the full statestored log.

        Attachments

        1. statestored.log
          449 kB
          Matthew Jacobs

          Issue Links

            Activity

              People

              • Assignee:
                sailesh Sailesh Mukil
                Reporter:
                mjacobs Matthew Jacobs
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: