Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 3.1.0
-
None
-
ghx-label-5
Description
Recent builds have failed due to a failure in test_breakpad.py. Assigning to Tim as the person who most recently touched this file.
Test output:
09:04:35 ==================================== ERRORS ==================================== 09:04:35 ___ ERROR at teardown of TestBreakpadExhaustive.test_minidump_cleanup_thread ___ 09:04:35 custom_cluster/test_breakpad.py:49: in teardown_method 09:04:35 self.kill_cluster(SIGKILL) 09:04:35 custom_cluster/test_breakpad.py:80: in kill_cluster 09:04:35 self.kill_processes(processes, signal) 09:04:35 custom_cluster/test_breakpad.py:85: in kill_processes 09:04:35 process.kill(signal) 09:04:35 common/impala_cluster.py:330: in kill 09:04:35 assert 0, "No processes %s found" % self.cmd 09:04:35 E AssertionError: No processes ['/data/jenkins/workspace/impala-cdh6.x-exhaustive-release/repos/Impala/be/build/latest/service/impalad', '-kudu_client_rpc_timeout_ms', '0', '-kudu_master_hosts', 'localhost', '--mem_limit=12884901888', '-logbufsecs=5', '-v=1', '-max_log_files=0', '-log_filename=impalad', '-log_dir=/data/jenkins/workspace/impala-cdh6.x-exhaustive-release/repos/Impala/logs/custom_cluster_tests', '-beeswax_port=21000', '-hs2_port=21050', '-be_port=22000', '-krpc_port=27000', '-state_store_subscriber_port=23000', '-webserver_port=25000', '-max_minidumps=2', '-logbufsecs=1', '-minidump_path=/tmp/tmpKaSw_w', '--default_query_options='] found
Distilled TEST-impala-custom-cluster.xml output:
-- 2019-01-23 08:00:43,585 INFO MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) … -- 2019-01-23 08:00:43,667 INFO MainThread: Killing: /data/jenkins/workspace/impala-cdh6.x-exhaustive-release/repos/Impala/be/build/latest/service/statestored -logbufsecs=5 -v=1 -max_log_files=0 -log_filename=statestored -log_dir=/data/jenkins/workspace/impala-cdh6.x-exhaustive-release/repos/Impala/logs/custom_cluster_tests -max_minidumps=2 -logbufsecs=1 -minidump_path=/tmp/tmpKaSw_w (PID: 16809) with signal 10 -- 2019-01-23 08:00:43,692 INFO MainThread: Found 6 impalad/1 statestored/1 catalogd process(es) ... E AssertionError: No processes ['/data/jenkins/workspace/impala-cdh6.x-exhaustive-release/repos/Impala/be/build/latest/service/impalad
Notice that the main thread appaars to be killing statestore, but fails to kill impalad. Notice that a message appears that says that all impalads are running in the midst of the code that tries to shut down the cluster. Is this test multi-threaded? Is there more than one “main thread” Are these main threads working at cross purposes? What recent change may have caused this?
Also, looks like the script is sending signal 10 (SIGUSR1) while the statestore (in its log) says it got a SIGTERM (15):
I0123 08:00:44.086009 16868 thrift-client.cc:78] Couldn't open transport for impala-ec2-centoCaught signal: SIGTERM. Daemon will exit.
Not terribly familiar with this area of the product, so bumping it over to the BE team.