Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5733

Kudu tservers seem to be unresponsive after TestKuduMemLimits

    XMLWordPrintableJSON

Details

    • ghx-label-7

    Description

      Two of henryr's gvo's for https://gerrit.cloudera.org/#/c/5715/ failed jobs after Kudu tservers became unresponsive: gvo 1 2

      It looks to me like Kudu is working through the execution of
      test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan

      Afterwards though, at least 1 tserver seems to become fully unresponsive or crash, though no stack/dump seems to be generated.

      In https://jenkins.impala.io/job/ubuntu-14.04-from-scratch/1777/ these tests run fine:

      ...
      06:15:16 query_test/test_insert_behaviour.py::TestInsertBehaviour::test_insert_select_with_empty_resultset PASSED
      06:16:23 query_test/test_kudu.py::TestKuduOperations::test_kudu_alter_table[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] PASSED
      06:17:06 query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-1] PASSED
      06:17:10 query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-10] PASSED
      06:17:11 query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none-0] PASSED
      06:17:13 query_test/test_lifecycle.py::TestFragmentLifecycleWithDebugActions::test_failure_in_prepare PASSED
      

      Shortly after (starting at 6:23), the first tserver starts reporting that other tablet leaders are unavailable:

      ...
      I0727 06:17:06.674669 20878 ts_tablet_manager.cc:1042] T 9bfcf775073844d0aa6251e7ee486375 P b497dbd6bd1a4c998891b00d2493a1bb: Tablet deleted. Last logged OpId: 1.1
      I0727 06:17:06.674680 20878 log.cc:974] T 9bfcf775073844d0aa6251e7ee486375 P b497dbd6bd1a4c998891b00d2493a1bb: Deleting WAL directory at /home/ubuntu/Impala/testdata/cluster/cdh5/node-1/var/lib/kudu/ts/wal/wals/9bfcf775073844d0aa6251e7ee486375
      I0727 06:17:06.674784 20878 ts_tablet_manager.cc:1060] T 9bfcf775073844d0aa6251e7ee486375 P b497dbd6bd1a4c998891b00d2493a1bb: Deleting consensus metadata
      I0727 06:17:06.675710 20877 ts_tablet_manager.cc:1042] T bbdd11b90f804c5a94a3242a27bbe2c7 P b497dbd6bd1a4c998891b00d2493a1bb: Tablet deleted. Last logged OpId: 1.1
      I0727 06:17:06.675725 20877 log.cc:974] T bbdd11b90f804c5a94a3242a27bbe2c7 P b497dbd6bd1a4c998891b00d2493a1bb: Deleting WAL directory at /home/ubuntu/Impala/testdata/cluster/cdh5/node-1/var/lib/kudu/ts/wal/wals/bbdd11b90f804c5a94a3242a27bbe2c7
      I0727 06:17:06.675817 20877 ts_tablet_manager.cc:1060] T bbdd11b90f804c5a94a3242a27bbe2c7 P b497dbd6bd1a4c998891b00d2493a1bb: Deleting consensus metadata
      I0727 06:23:58.746656 114414 raft_consensus.cc:411] T df6cfc0be3494e52b31b93d2298d1663 P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
      I0727 06:23:58.744894 111995 raft_consensus.cc:411] T 3092a2a1be4e47c3aa26e260c1eea55b P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 3a3c735705964e1badb66c37a66a9096)
      I0727 06:23:58.728032 112454 raft_consensus.cc:411] T 4b74a3b8327943648255613c164c0b03 P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
      I0727 06:23:58.741551 114377 raft_consensus.cc:411] T ada7c413be4941979ed4e6cb659de772 P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
      ...
      I0727 06:23:58.960822 111910 raft_consensus.cc:411] T 27a5b15a78254fa1890137a0f3df9276 P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
      I0727 06:23:58.961027 114591 raft_consensus.cc:411] T ffec3d0d55db4cd1a5f9c9b7e0199acf P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
      I0727 06:23:58.961103 114665 raft_consensus.cc:411] T 1ed3de3d7ce0420880b2146cc0572329 P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 3a3c735705964e1badb66c37a66a9096)
      I0727 06:23:58.961292 112077 raft_consensus.cc:411] T 904259aacdee4b52902b54cf4a48422a P b497dbd6bd1a4c998891b00d2493a1bb [term 1 FOLLOWER]: Starting pre-election (detected failure of leader 6f1af300e1d549d6b1cdd8bf3b9aeb9c)
      

      The other Kudu tserver logs just end at 6:17, but there is no indication that they crashed.

      Attachments

        1. kudu-master.log
          415 kB
          Matthew Jacobs
        2. impalad1.log.tar.gz
          1.50 MB
          Matthew Jacobs
        3. jenkins-console.log
          1.39 MB
          Matthew Jacobs
        4. kudu-tserver2.log
          1.74 MB
          Matthew Jacobs
        5. kudu-tserver1.log
          2.25 MB
          Matthew Jacobs
        6. syslog.out
          272 kB
          Matthew Jacobs

        Issue Links

          Activity

            People

              mjacobs Matthew Jacobs
              mjacobs Matthew Jacobs
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: