Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.10.0
-
ghx-label-7
Description
Two of henryr's gvo's for https://gerrit.cloudera.org/#/c/5715/ failed jobs after Kudu tservers became unresponsive: gvo 1 2
It looks to me like Kudu is working through the execution of
test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan
Afterwards though, at least 1 tserver seems to become fully unresponsive or crash, though no stack/dump seems to be generated.
In https://jenkins.impala.io/job/ubuntu-14.04-from-scratch/1777/ these tests run fine:
... 06:15:16 query_test/test_insert_behaviour.py::TestInsertBehaviour::test_insert_select_with_empty_resultset PASSED 06:16:23 query_test/test_kudu.py::TestKuduOperations::test_kudu_alter_table[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] PASSED 06:17:06 query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-1] PASSED 06:17:10 query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-10] PASSED 06:17:11 query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-0] PASSED 06:17:13 query_test/test_lifecycle.py::TestFragmentLifecycleWithDebugActions::test_failure_in_prepare PASSED
Shortly after (starting at 6:23), the first tserver starts reporting that other tablet leaders are unavailable:
... I0727 06:17:06.674669 20878 ts_tablet_manager.cc:1042] T 9bfcf775073844d0aa6251e7ee486375 P b497dbd6bd1a4c998891b00d2493a1bb: Tablet deleted. Last logged OpId: 1.1 I0727 06:17:06.674680 20878 log.cc:974] T 9bfcf775073844d0aa6251e7ee486375 P b497dbd6bd1a4c998891b00d2493a1bb: Deleting WAL directory at /home/ubuntu/Impala/testdata/cluster/cdh5/node-1/var/lib/kudu/ts/wal/wals/9bfcf775073844d0aa6251e7ee486375 I0727 06:17:06.674784 20878 ts_tablet_manager.cc:1060] T 9bfcf775073844d0aa6251e7ee486375 P b497dbd6bd1a4c998891b00d2493a1bb: Deleting consensus metadata I0727 06:17:06.675710 20877 ts_tablet_manager.cc:1042] T bbdd11b90f804c5a94a3242a27bbe2c7 P b497dbd6bd1a4c998891b00d2493a1bb: Tablet deleted. Last logged OpId: 1.1 I0727 06:17:06.675725 20877 log.cc:974] T bbdd11b90f804c5a94a3242a27bbe2c7 P b497dbd6bd1a4c998891b00d2493a1bb: Deleting WAL directory at /home/ubuntu/Impala/testdata/cluster/cdh5/node-1/var/lib/kudu/ts/wal/wals/bbdd11b90f804c5a94a3242a27bbe2c7 I0727 06:17:06.675817 20877 ts_tablet_manager.cc:1060] T bbdd11b90f804c5a94a3242a27bbe2c7 P b497dbd6bd1a4c998891b00d2493a1bb: Deleting consensus metadata I0727 06:23:58.746656 114414 raft_consensus.cc:411] T df6cfc0be3494e52b31b93d2298d1663 P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c) I0727 06:23:58.744894 111995 raft_consensus.cc:411] T 3092a2a1be4e47c3aa26e260c1eea55b P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 3a3c735705964e1badb66c37a66a9096) I0727 06:23:58.728032 112454 raft_consensus.cc:411] T 4b74a3b8327943648255613c164c0b03 P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c) I0727 06:23:58.741551 114377 raft_consensus.cc:411] T ada7c413be4941979ed4e6cb659de772 P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c) ... I0727 06:23:58.960822 111910 raft_consensus.cc:411] T 27a5b15a78254fa1890137a0f3df9276 P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c) I0727 06:23:58.961027 114591 raft_consensus.cc:411] T ffec3d0d55db4cd1a5f9c9b7e0199acf P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c) I0727 06:23:58.961103 114665 raft_consensus.cc:411] T 1ed3de3d7ce0420880b2146cc0572329 P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 3a3c735705964e1badb66c37a66a9096) I0727 06:23:58.961292 112077 raft_consensus.cc:411] T 904259aacdee4b52902b54cf4a48422a P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
The other Kudu tserver logs just end at 6:17, but there is no indication that they crashed.
Attachments
Attachments
Issue Links
- is superceded by
-
IMPALA-5737 Constrain and reduce memory requirements of minicluster and test processes
- Open