Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-11
Description
We found that test_multiple_coordinator() could fail because _start_impala_cluster() returned non-zero exit status. test_multiple_coordinator() calls test_multiple_coordinator() at https://github.com/apache/impala/blame/master/tests/custom_cluster/test_coordinators.py#L41C10-L41C31.
Error Message
CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
Stacktrace
custom_cluster/test_coordinators.py:41: in test_multiple_coordinators self._start_impala_cluster([], num_coordinators=2, cluster_size=3) common/custom_cluster_test_suite.py:330: in _start_impala_cluster check_call(cmd + options, close_fds=True) /data/jenkins/workspace/impala-asf-master-core-erasure-coding/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/subprocess.py:190: in check_call raise CalledProcessError(retcode, cmd) E CalledProcessError: Command '['/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py', '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50', '--cluster_size=3', '--num_coordinators=2', '--log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests', '--log_level=1', '--impalad_args=--default_query_options=']' returned non-zero exit status 1
The following console output shows that 'num_known_live_backends' could not reach 3 in 4 mins and thus the command that starts the cluster failed with non-zero exit status.
-- 2023-06-21 20:54:40,594 INFO MainThread: Starting cluster with command: /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py '--state_store_args=--statestore_update_frequency_ms=50 --statestore_priority_update_frequency_ms=50 --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=2 --log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests --log_level=1 --impalad_args=--default_query_options= 20:54:41 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es) 20:54:41 MainThread: Starting State Store logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO 20:54:42 MainThread: Starting Catalog Service logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/catalogd.INFO 20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad.INFO 20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO 20:54:43 MainThread: Starting Impala Daemon logging to /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO 20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 20:54:46 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 20:54:46 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000 20:54:46 MainThread: Waiting for num_known_live_backends=3. Current value: 1 20:54:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 20:54:47 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000 20:54:47 MainThread: Waiting for num_known_live_backends=3. Current value: 1 20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 20:54:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25000 20:54:48 MainThread: num_known_live_backends has reached value: 3 20:54:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 20:54:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25001 20:54:48 MainThread: Waiting for num_known_live_backends=3. Current value: 2 ... 20:58:48 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es) 20:58:48 MainThread: Getting num_known_live_backends from impala-ec2-centos79-m6i-4xlarge-ondemand-1576.vpc.cloudera.com:25001 20:58:48 MainThread: Waiting for num_known_live_backends=3. Current value: 2 20:58:49 MainThread: Error starting cluster Traceback (most recent call last): File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py", line 931, in <module> expected_cluster_size - expected_catalog_delays) File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/common/impala_cluster.py", line 205, in wait_until_ready early_abort_fn=check_processes_still_running) File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/common/impala_service.py", line 374, in wait_for_num_known_live_backends assert 0, 'num_known_live_backends did not reach expected value in time' AssertionError: num_known_live_backends did not reach expected value in time -- 2023-06-21 20:58:49,141 DEBUG MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)