Description
Here is the output from a failed run of the Python unit tests:
02:53:15 _____________ ERROR at setup of TestClient.test_capture_kudu_error _____________ 02:53:15 02:53:15 cls = <class 'kudu.tests.test_client.TestClient'> 02:53:15 02:53:15 @classmethod 02:53:15 def setUpClass(cls): 02:53:15 cls.cluster_path, master_port = cls.start_cluster() 02:53:15 time.sleep(1) 02:53:15 02:53:15 cls.master_host = '127.0.0.1' 02:53:15 cls.master_port = master_port 02:53:15 02:53:15 cls.client = kudu.connect(cls.master_host, cls.master_port) 02:53:15 02:53:15 cls.schema = cls.example_schema() 02:53:15 cls.partitioning = cls.example_partitioning() 02:53:15 02:53:15 cls.ex_table = 'example-table' 02:53:15 if cls.client.table_exists(cls.ex_table): 02:53:15 cls.client.delete_table(cls.ex_table) 02:53:15 > cls.client.create_table(cls.ex_table, cls.schema, cls.partitioning) 02:53:15 02:53:15 kudu/tests/common.py:140: 02:53:15 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 02:53:15 kudu/client.pyx:249: in kudu.client.Client.create_table (kudu/client.cpp:4819) 02:53:15 check_status(s) 02:53:15 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 02:53:15 02:53:15 > raise KuduInvalidArgument(c_message) 02:53:15 E KuduInvalidArgument: Error creating table example-table on the master: Not enough live tablet servers to create a table with the requested replication factor 1. 0 tablet servers are alive. 02:53:15 02:53:15 kudu/errors.pyx:60: KuduInvalidArgument 02:53:15 ---------------------------- Captured stderr setup ----------------------------- 02:53:15 I0712 02:53:13.192479 7420 mem_tracker.cc:138] MemTracker: hard memory limit is 94.627029 GB 02:53:15 I0712 02:53:13.192636 7420 mem_tracker.cc:140] MemTracker: soft memory limit is 56.776218 GB 02:53:15 I0712 02:53:13.202077 7420 master_main.cc:58] Initializing master server... 02:53:15 I0712 02:53:13.202827 7420 hybrid_clock.cc:178] HybridClock initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current error: 969344 02:53:15 I0712 02:53:13.305677 7420 server_base.cc:164] Could not load existing FS layout: Not found: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/data/instance: No such file or directory (error 2) 02:53:15 I0712 02:53:13.305732 7420 server_base.cc:165] Creating new FS layout 02:53:15 I0712 02:53:13.359714 7420 fs_manager.cc:345] Generated new instance metadata in path /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/data/instance: 02:53:15 uuid: "3ace416aa5ec45d692d21875e8c23322" 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:13 on jenkins-slave-c3oe" 02:53:15 I0712 02:53:13.414095 7420 fs_manager.cc:242] Opened local filesystem: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/data 02:53:15 uuid: "3ace416aa5ec45d692d21875e8c23322" 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:13 on jenkins-slave-c3oe" 02:53:15 I0712 02:53:13.499505 7420 master_main.cc:61] Starting Master server... 02:53:15 I0712 02:53:13.726842 7420 rpc_server.cc:164] RPC server started. Bound to: 0.0.0.0:52533 02:53:15 I0712 02:53:13.727243 7420 webserver.cc:122] Starting webserver on 0.0.0.0:0 02:53:15 I0712 02:53:13.727269 7420 webserver.cc:127] Document root: /home/jenkins-slave/workspace/kudu-3/www 02:53:15 I0712 02:53:13.727638 7420 webserver.cc:217] Webserver started. Bound to: http://0.0.0.0:47227/ 02:53:15 I0712 02:53:13.728736 7420 server_base.cc:234] Dumped server information to /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/config.json 02:53:15 I0712 02:53:13.769598 7518 tablet_bootstrap.cc:393] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: Bootstrap starting. 02:53:15 I0712 02:53:13.771863 7518 tablet_bootstrap.cc:553] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: Time spent opening tablet: real 0.001s user 0.000s sys 0.000s 02:53:15 I0712 02:53:13.772128 7518 tablet_bootstrap.cc:493] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: No blocks or log segments found. Creating new log. 02:53:15 I0712 02:53:13.772845 7518 log.cc:349] Log is configured to *not* fsync() on all Append() calls 02:53:15 I0712 02:53:13.774338 7518 tablet_bootstrap.cc:393] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: No bootstrap required, opened a new log 02:53:15 I0712 02:53:13.776185 7518 raft_consensus.cc:269] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: Replica starting. Triggering 0 pending transactions. Active config: opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } 02:53:15 I0712 02:53:13.776408 7518 raft_consensus.cc:301] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: Consensus starting up: Expiring failure detector timer to make a prompt election more likely 02:53:15 I0712 02:53:13.776487 7518 raft_consensus.cc:499] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: Becoming Follower/Learner. State: Replica: 3ace416aa5ec45d692d21875e8c23322, State: 1, Role: FOLLOWER 02:53:15 Watermarks: {Received: term: 0 index: 0 Committed: term: 0 index: 0} 02:53:15 I0712 02:53:13.776564 7518 consensus_queue.cc:162] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated op: 0.0, Majority replicated op: 0.0, Committed index: 0.0, Last appended: 0.0, Current term: 0, Majority size: -1, State: 1, Mode: NON_LEADER 02:53:15 I0712 02:53:13.776618 7518 raft_consensus.cc:314] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: Only one voter in the Raft config. Triggering election immediately 02:53:15 I0712 02:53:13.776641 7518 raft_consensus.cc:370] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: No leader contacted us within the election timeout. Triggering leader election 02:53:15 I0712 02:53:13.776671 7518 raft_consensus.cc:2019] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: Advancing to term 1 02:53:15 I0712 02:53:13.777194 7518 raft_consensus.cc:1970] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 FOLLOWER]: Snoozing failure detection for election timeout plus an additional 1.636s 02:53:15 I0712 02:53:13.777375 7518 raft_consensus.cc:386] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 FOLLOWER]: Starting election with config: opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } 02:53:15 I0712 02:53:13.778064 7518 leader_election.cc:248] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [CANDIDATE]: Term 1 election: Election decided. Result: candidate won. 02:53:15 I0712 02:53:13.778406 7527 raft_consensus.cc:1970] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 FOLLOWER]: Snoozing failure detection for election timeout plus an additional 1.630s 02:53:15 I0712 02:53:13.778456 7527 raft_consensus.cc:1859] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 FOLLOWER]: Leader election won for term 1 02:53:15 I0712 02:53:13.790758 7527 raft_consensus.cc:464] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 LEADER]: Becoming Leader. State: Replica: 3ace416aa5ec45d692d21875e8c23322, State: 1, Role: LEADER 02:53:15 Watermarks: {Received: term: 0 index: 0 Committed: term: 0 index: 0} 02:53:15 I0712 02:53:13.791013 7527 consensus_queue.cc:145] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [LEADER]: Queue going to LEADER mode. State: All replicated op: 0.0, Majority replicated op: 0.0, Committed index: 0.0, Last appended: 0.0, Current term: 1, Majority size: 1, State: 1, Mode: LEADER, active raft config: opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } 02:53:15 I0712 02:53:13.794667 7527 sys_catalog.cc:235] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: SysCatalogTable state changed. Reason: New leader 3ace416aa5ec45d692d21875e8c23322. Latest consensus state: current_term: 1 leader_uuid: "3ace416aa5ec45d692d21875e8c23322" config { opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } } 02:53:15 I0712 02:53:13.794703 7527 sys_catalog.cc:238] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: This master's current role is: LEADER 02:53:15 I0712 02:53:13.794970 7518 sys_catalog.cc:235] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: SysCatalogTable state changed. Reason: Started TabletPeer. Latest consensus state: current_term: 1 leader_uuid: "3ace416aa5ec45d692d21875e8c23322" config { opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } } 02:53:15 I0712 02:53:13.795006 7518 sys_catalog.cc:238] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: This master's current role is: LEADER 02:53:15 I0712 02:53:13.795017 7529 sys_catalog.cc:235] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: SysCatalogTable state changed. Reason: RaftConsensus started. Latest consensus state: current_term: 1 leader_uuid: "3ace416aa5ec45d692d21875e8c23322" config { opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } } 02:53:15 I0712 02:53:13.795081 7529 sys_catalog.cc:238] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: This master's current role is: LEADER 02:53:15 I0712 02:53:13.803755 7531 raft_consensus_state.cc:534] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 LEADER]: Advanced the committed_index across terms. Last committed operation was: term: 0 index: 0 New committed index is: term: 1 index: 1 02:53:15 I0712 02:53:13.804172 7518 sys_catalog.cc:311] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: configured and running, proceeding with master startup. 02:53:15 W0712 02:53:13.804466 7534 catalog_manager.cc:345] Catalog manager background task thread going to sleep: Service unavailable: Catalog manager is not initialized. State: 1 02:53:15 I0712 02:53:13.804584 7532 catalog_manager.cc:636] Loading table and tablet metadata into memory... 02:53:15 I0712 02:53:13.804772 7420 master_main.cc:64] Master server successfully started. 02:53:15 I0712 02:53:14.245645 7564 mem_tracker.cc:138] MemTracker: hard memory limit is 94.627029 GB 02:53:15 I0712 02:53:14.245791 7564 mem_tracker.cc:140] MemTracker: soft memory limit is 56.776218 GB 02:53:15 I0712 02:53:14.257587 7562 mem_tracker.cc:138] MemTracker: hard memory limit is 94.627029 GB 02:53:15 I0712 02:53:14.257714 7562 mem_tracker.cc:140] MemTracker: soft memory limit is 56.776218 GB 02:53:15 I0712 02:53:14.266713 7566 mem_tracker.cc:138] MemTracker: hard memory limit is 94.627029 GB 02:53:15 I0712 02:53:14.266863 7566 mem_tracker.cc:140] MemTracker: soft memory limit is 56.776218 GB 02:53:15 I0712 02:53:14.302768 7566 tablet_server_main.cc:54] Initializing tablet server... 02:53:15 I0712 02:53:14.303900 7566 hybrid_clock.cc:178] HybridClock initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current error: 969844 02:53:15 I0712 02:53:14.307047 7566 server_base.cc:164] Could not load existing FS layout: Not found: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/2/data/instance: No such file or directory (error 2) 02:53:15 I0712 02:53:14.307086 7566 server_base.cc:165] Creating new FS layout 02:53:15 I0712 02:53:14.310693 7562 tablet_server_main.cc:54] Initializing tablet server... 02:53:15 I0712 02:53:14.311682 7562 hybrid_clock.cc:178] HybridClock initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current error: 969844 02:53:15 I0712 02:53:14.314699 7562 server_base.cc:164] Could not load existing FS layout: Not found: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/0/data/instance: No such file or directory (error 2) 02:53:15 I0712 02:53:14.314730 7562 server_base.cc:165] Creating new FS layout 02:53:15 I0712 02:53:14.318703 7564 tablet_server_main.cc:54] Initializing tablet server... 02:53:15 I0712 02:53:14.319725 7564 hybrid_clock.cc:178] HybridClock initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current error: 969844 02:53:15 I0712 02:53:14.322813 7564 server_base.cc:164] Could not load existing FS layout: Not found: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/1/data/instance: No such file or directory (error 2) 02:53:15 I0712 02:53:14.322846 7564 server_base.cc:165] Creating new FS layout 02:53:15 I0712 02:53:14.357198 7564 fs_manager.cc:345] Generated new instance metadata in path /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/1/data/instance: 02:53:15 uuid: "76029326f81443679b55fcc12dcb8431" 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe" 02:53:15 I0712 02:53:14.357671 7566 fs_manager.cc:345] Generated new instance metadata in path /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/2/data/instance: 02:53:15 uuid: "82fdf4247a094384bfcbd0793f59941d" 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe" 02:53:15 I0712 02:53:14.378212 7562 fs_manager.cc:345] Generated new instance metadata in path /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/0/data/instance: 02:53:15 uuid: "aae8d534a60648bfb24892221b73f659" 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe" 02:53:15 I0712 02:53:14.499289 7566 fs_manager.cc:242] Opened local filesystem: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/2/data 02:53:15 uuid: "82fdf4247a094384bfcbd0793f59941d" 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe" 02:53:15 I0712 02:53:14.531463 7562 fs_manager.cc:242] Opened local filesystem: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/0/data 02:53:15 uuid: "aae8d534a60648bfb24892221b73f659" 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe" 02:53:15 I0712 02:53:14.542300 7564 fs_manager.cc:242] Opened local filesystem: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/1/data 02:53:15 uuid: "76029326f81443679b55fcc12dcb8431" 02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe" 02:53:15 I0712 02:53:14.576563 7566 tablet_server_main.cc:57] Starting tablet server... 02:53:15 I0712 02:53:14.606667 7564 tablet_server_main.cc:57] Starting tablet server... 02:53:15 I0712 02:53:14.624748 7562 tablet_server_main.cc:57] Starting tablet server... 02:53:15 I0712 02:53:15.243650 7492 catalog_manager.cc:762] CreateTable from {real_user=jenkins-slave, eff_user=} at 127.0.0.1:47450: 02:53:15 name: "example-table" 02:53:15 schema { 02:53:15 columns { 02:53:15 name: "key" 02:53:15 type: INT32 02:53:15 is_key: true 02:53:15 is_nullable: false 02:53:15 encoding: AUTO_ENCODING 02:53:15 compression: DEFAULT_COMPRESSION 02:53:15 cfile_block_size: 0 02:53:15 } 02:53:15 columns { 02:53:15 name: "int_val" 02:53:15 type: INT32 02:53:15 is_key: false 02:53:15 is_nullable: true 02:53:15 encoding: AUTO_ENCODING 02:53:15 compression: DEFAULT_COMPRESSION 02:53:15 cfile_block_size: 0 02:53:15 } 02:53:15 columns { 02:53:15 name: "string_val" 02:53:15 type: STRING 02:53:15 is_key: false 02:53:15 is_nullable: true 02:53:15 encoding: AUTO_ENCODING 02:53:15 compression: DEFAULT_COMPRESSION 02:53:15 cfile_block_size: 0 02:53:15 } 02:53:15 } 02:53:15 split_rows_range_bounds { 02:53:15 } 02:53:15 partition_schema { 02:53:15 range_schema { 02:53:15 columns { 02:53:15 name: "key" 02:53:15 } 02:53:15 } 02:53:15 }
Here the CreateTable() call has arrived to the master before the tservers have finished initializing, and thus before they have heartbeated to the master.
We should modify the start_cluster() call to wait until the tservers are initialized. Possibly this means adding --server_dump_info_path=... to the command line arguments and waiting for the file to show up. Or maybe that's insufficient and we should also query the master for the list of tservers. Either way, we need to wait a little longer in case the tservers are slow to start.