Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1527

Python unit tests should wait for tservers to up and running

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.9.1
    • Fix Version/s: 1.0.0
    • Component/s: python
    • Labels:
      None
    • Target Version/s:

      Description

      Here is the output from a failed run of the Python unit tests:

      02:53:15 _____________ ERROR at setup of TestClient.test_capture_kudu_error _____________
      02:53:15 
      02:53:15 cls = <class 'kudu.tests.test_client.TestClient'>
      02:53:15 
      02:53:15     @classmethod
      02:53:15     def setUpClass(cls):
      02:53:15         cls.cluster_path, master_port = cls.start_cluster()
      02:53:15         time.sleep(1)
      02:53:15     
      02:53:15         cls.master_host = '127.0.0.1'
      02:53:15         cls.master_port = master_port
      02:53:15     
      02:53:15         cls.client = kudu.connect(cls.master_host, cls.master_port)
      02:53:15     
      02:53:15         cls.schema = cls.example_schema()
      02:53:15         cls.partitioning = cls.example_partitioning()
      02:53:15     
      02:53:15         cls.ex_table = 'example-table'
      02:53:15         if cls.client.table_exists(cls.ex_table):
      02:53:15             cls.client.delete_table(cls.ex_table)
      02:53:15 >       cls.client.create_table(cls.ex_table, cls.schema, cls.partitioning)
      02:53:15 
      02:53:15 kudu/tests/common.py:140: 
      02:53:15 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      02:53:15 kudu/client.pyx:249: in kudu.client.Client.create_table (kudu/client.cpp:4819)
      02:53:15     check_status(s)
      02:53:15 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      02:53:15 
      02:53:15 >   raise KuduInvalidArgument(c_message)
      02:53:15 E   KuduInvalidArgument: Error creating table example-table on the master: Not enough live tablet servers to create a table with the requested replication factor 1. 0 tablet servers are alive.
      02:53:15 
      02:53:15 kudu/errors.pyx:60: KuduInvalidArgument
      02:53:15 ---------------------------- Captured stderr setup -----------------------------
      02:53:15 I0712 02:53:13.192479  7420 mem_tracker.cc:138] MemTracker: hard memory limit is 94.627029 GB
      02:53:15 I0712 02:53:13.192636  7420 mem_tracker.cc:140] MemTracker: soft memory limit is 56.776218 GB
      02:53:15 I0712 02:53:13.202077  7420 master_main.cc:58] Initializing master server...
      02:53:15 I0712 02:53:13.202827  7420 hybrid_clock.cc:178] HybridClock initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current error: 969344
      02:53:15 I0712 02:53:13.305677  7420 server_base.cc:164] Could not load existing FS layout: Not found: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/data/instance: No such file or directory (error 2)
      02:53:15 I0712 02:53:13.305732  7420 server_base.cc:165] Creating new FS layout
      02:53:15 I0712 02:53:13.359714  7420 fs_manager.cc:345] Generated new instance metadata in path /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/data/instance:
      02:53:15 uuid: "3ace416aa5ec45d692d21875e8c23322"
      02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:13 on jenkins-slave-c3oe"
      02:53:15 I0712 02:53:13.414095  7420 fs_manager.cc:242] Opened local filesystem: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/data
      02:53:15 uuid: "3ace416aa5ec45d692d21875e8c23322"
      02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:13 on jenkins-slave-c3oe"
      02:53:15 I0712 02:53:13.499505  7420 master_main.cc:61] Starting Master server...
      02:53:15 I0712 02:53:13.726842  7420 rpc_server.cc:164] RPC server started. Bound to: 0.0.0.0:52533
      02:53:15 I0712 02:53:13.727243  7420 webserver.cc:122] Starting webserver on 0.0.0.0:0
      02:53:15 I0712 02:53:13.727269  7420 webserver.cc:127] Document root: /home/jenkins-slave/workspace/kudu-3/www
      02:53:15 I0712 02:53:13.727638  7420 webserver.cc:217] Webserver started. Bound to: http://0.0.0.0:47227/
      02:53:15 I0712 02:53:13.728736  7420 server_base.cc:234] Dumped server information to /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/master/config.json
      02:53:15 I0712 02:53:13.769598  7518 tablet_bootstrap.cc:393] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: Bootstrap starting.
      02:53:15 I0712 02:53:13.771863  7518 tablet_bootstrap.cc:553] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: Time spent opening tablet: real 0.001s	user 0.000s	sys 0.000s
      02:53:15 I0712 02:53:13.772128  7518 tablet_bootstrap.cc:493] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: No blocks or log segments found. Creating new log.
      02:53:15 I0712 02:53:13.772845  7518 log.cc:349] Log is configured to *not* fsync() on all Append() calls
      02:53:15 I0712 02:53:13.774338  7518 tablet_bootstrap.cc:393] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322: No bootstrap required, opened a new log
      02:53:15 I0712 02:53:13.776185  7518 raft_consensus.cc:269] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: Replica starting. Triggering 0 pending transactions. Active config: opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER }
      02:53:15 I0712 02:53:13.776408  7518 raft_consensus.cc:301] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: Consensus starting up: Expiring failure detector timer to make a prompt election more likely
      02:53:15 I0712 02:53:13.776487  7518 raft_consensus.cc:499] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: Becoming Follower/Learner. State: Replica: 3ace416aa5ec45d692d21875e8c23322, State: 1, Role: FOLLOWER
      02:53:15 Watermarks: {Received: term: 0 index: 0 Committed: term: 0 index: 0}
      02:53:15 I0712 02:53:13.776564  7518 consensus_queue.cc:162] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated op: 0.0, Majority replicated op: 0.0, Committed index: 0.0, Last appended: 0.0, Current term: 0, Majority size: -1, State: 1, Mode: NON_LEADER
      02:53:15 I0712 02:53:13.776618  7518 raft_consensus.cc:314] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: Only one voter in the Raft config. Triggering election immediately
      02:53:15 I0712 02:53:13.776641  7518 raft_consensus.cc:370] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: No leader contacted us within the election timeout. Triggering leader election
      02:53:15 I0712 02:53:13.776671  7518 raft_consensus.cc:2019] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 0 FOLLOWER]: Advancing to term 1
      02:53:15 I0712 02:53:13.777194  7518 raft_consensus.cc:1970] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 FOLLOWER]: Snoozing failure detection for election timeout plus an additional 1.636s
      02:53:15 I0712 02:53:13.777375  7518 raft_consensus.cc:386] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 FOLLOWER]: Starting election with config: opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER }
      02:53:15 I0712 02:53:13.778064  7518 leader_election.cc:248] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [CANDIDATE]: Term 1 election: Election decided. Result: candidate won.
      02:53:15 I0712 02:53:13.778406  7527 raft_consensus.cc:1970] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 FOLLOWER]: Snoozing failure detection for election timeout plus an additional 1.630s
      02:53:15 I0712 02:53:13.778456  7527 raft_consensus.cc:1859] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 FOLLOWER]: Leader election won for term 1
      02:53:15 I0712 02:53:13.790758  7527 raft_consensus.cc:464] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 LEADER]: Becoming Leader. State: Replica: 3ace416aa5ec45d692d21875e8c23322, State: 1, Role: LEADER
      02:53:15 Watermarks: {Received: term: 0 index: 0 Committed: term: 0 index: 0}
      02:53:15 I0712 02:53:13.791013  7527 consensus_queue.cc:145] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [LEADER]: Queue going to LEADER mode. State: All replicated op: 0.0, Majority replicated op: 0.0, Committed index: 0.0, Last appended: 0.0, Current term: 1, Majority size: 1, State: 1, Mode: LEADER, active raft config: opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER }
      02:53:15 I0712 02:53:13.794667  7527 sys_catalog.cc:235] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: SysCatalogTable state changed. Reason: New leader 3ace416aa5ec45d692d21875e8c23322. Latest consensus state: current_term: 1 leader_uuid: "3ace416aa5ec45d692d21875e8c23322" config { opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } }
      02:53:15 I0712 02:53:13.794703  7527 sys_catalog.cc:238] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: This master's current role is: LEADER
      02:53:15 I0712 02:53:13.794970  7518 sys_catalog.cc:235] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: SysCatalogTable state changed. Reason: Started TabletPeer. Latest consensus state: current_term: 1 leader_uuid: "3ace416aa5ec45d692d21875e8c23322" config { opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } }
      02:53:15 I0712 02:53:13.795006  7518 sys_catalog.cc:238] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: This master's current role is: LEADER
      02:53:15 I0712 02:53:13.795017  7529 sys_catalog.cc:235] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: SysCatalogTable state changed. Reason: RaftConsensus started. Latest consensus state: current_term: 1 leader_uuid: "3ace416aa5ec45d692d21875e8c23322" config { opid_index: -1 peers { permanent_uuid: "3ace416aa5ec45d692d21875e8c23322" member_type: VOTER } }
      02:53:15 I0712 02:53:13.795081  7529 sys_catalog.cc:238] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: This master's current role is: LEADER
      02:53:15 I0712 02:53:13.803755  7531 raft_consensus_state.cc:534] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [term 1 LEADER]: Advanced the committed_index across terms. Last committed operation was: term: 0 index: 0 New committed index is: term: 1 index: 1
      02:53:15 I0712 02:53:13.804172  7518 sys_catalog.cc:311] T 00000000000000000000000000000000 P 3ace416aa5ec45d692d21875e8c23322 [sys.catalog]: configured and running, proceeding with master startup.
      02:53:15 W0712 02:53:13.804466  7534 catalog_manager.cc:345] Catalog manager background task thread going to sleep: Service unavailable: Catalog manager is not initialized. State: 1
      02:53:15 I0712 02:53:13.804584  7532 catalog_manager.cc:636] Loading table and tablet metadata into memory...
      02:53:15 I0712 02:53:13.804772  7420 master_main.cc:64] Master server successfully started.
      02:53:15 I0712 02:53:14.245645  7564 mem_tracker.cc:138] MemTracker: hard memory limit is 94.627029 GB
      02:53:15 I0712 02:53:14.245791  7564 mem_tracker.cc:140] MemTracker: soft memory limit is 56.776218 GB
      02:53:15 I0712 02:53:14.257587  7562 mem_tracker.cc:138] MemTracker: hard memory limit is 94.627029 GB
      02:53:15 I0712 02:53:14.257714  7562 mem_tracker.cc:140] MemTracker: soft memory limit is 56.776218 GB
      02:53:15 I0712 02:53:14.266713  7566 mem_tracker.cc:138] MemTracker: hard memory limit is 94.627029 GB
      02:53:15 I0712 02:53:14.266863  7566 mem_tracker.cc:140] MemTracker: soft memory limit is 56.776218 GB
      02:53:15 I0712 02:53:14.302768  7566 tablet_server_main.cc:54] Initializing tablet server...
      02:53:15 I0712 02:53:14.303900  7566 hybrid_clock.cc:178] HybridClock initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current error: 969844
      02:53:15 I0712 02:53:14.307047  7566 server_base.cc:164] Could not load existing FS layout: Not found: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/2/data/instance: No such file or directory (error 2)
      02:53:15 I0712 02:53:14.307086  7566 server_base.cc:165] Creating new FS layout
      02:53:15 I0712 02:53:14.310693  7562 tablet_server_main.cc:54] Initializing tablet server...
      02:53:15 I0712 02:53:14.311682  7562 hybrid_clock.cc:178] HybridClock initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current error: 969844
      02:53:15 I0712 02:53:14.314699  7562 server_base.cc:164] Could not load existing FS layout: Not found: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/0/data/instance: No such file or directory (error 2)
      02:53:15 I0712 02:53:14.314730  7562 server_base.cc:165] Creating new FS layout
      02:53:15 I0712 02:53:14.318703  7564 tablet_server_main.cc:54] Initializing tablet server...
      02:53:15 I0712 02:53:14.319725  7564 hybrid_clock.cc:178] HybridClock initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current error: 969844
      02:53:15 I0712 02:53:14.322813  7564 server_base.cc:164] Could not load existing FS layout: Not found: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/1/data/instance: No such file or directory (error 2)
      02:53:15 I0712 02:53:14.322846  7564 server_base.cc:165] Creating new FS layout
      02:53:15 I0712 02:53:14.357198  7564 fs_manager.cc:345] Generated new instance metadata in path /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/1/data/instance:
      02:53:15 uuid: "76029326f81443679b55fcc12dcb8431"
      02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe"
      02:53:15 I0712 02:53:14.357671  7566 fs_manager.cc:345] Generated new instance metadata in path /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/2/data/instance:
      02:53:15 uuid: "82fdf4247a094384bfcbd0793f59941d"
      02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe"
      02:53:15 I0712 02:53:14.378212  7562 fs_manager.cc:345] Generated new instance metadata in path /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/0/data/instance:
      02:53:15 uuid: "aae8d534a60648bfb24892221b73f659"
      02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe"
      02:53:15 I0712 02:53:14.499289  7566 fs_manager.cc:242] Opened local filesystem: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/2/data
      02:53:15 uuid: "82fdf4247a094384bfcbd0793f59941d"
      02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe"
      02:53:15 I0712 02:53:14.531463  7562 fs_manager.cc:242] Opened local filesystem: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/0/data
      02:53:15 uuid: "aae8d534a60648bfb24892221b73f659"
      02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe"
      02:53:15 I0712 02:53:14.542300  7564 fs_manager.cc:242] Opened local filesystem: /home/jenkins-slave/workspace/kudu-3/build/test-tmp/tmp5JN8R3/ts/1/data
      02:53:15 uuid: "76029326f81443679b55fcc12dcb8431"
      02:53:15 format_stamp: "Formatted at 2016-07-12 02:53:14 on jenkins-slave-c3oe"
      02:53:15 I0712 02:53:14.576563  7566 tablet_server_main.cc:57] Starting tablet server...
      02:53:15 I0712 02:53:14.606667  7564 tablet_server_main.cc:57] Starting tablet server...
      02:53:15 I0712 02:53:14.624748  7562 tablet_server_main.cc:57] Starting tablet server...
      02:53:15 I0712 02:53:15.243650  7492 catalog_manager.cc:762] CreateTable from {real_user=jenkins-slave, eff_user=} at 127.0.0.1:47450:
      02:53:15 name: "example-table"
      02:53:15 schema {
      02:53:15   columns {
      02:53:15     name: "key"
      02:53:15     type: INT32
      02:53:15     is_key: true
      02:53:15     is_nullable: false
      02:53:15     encoding: AUTO_ENCODING
      02:53:15     compression: DEFAULT_COMPRESSION
      02:53:15     cfile_block_size: 0
      02:53:15   }
      02:53:15   columns {
      02:53:15     name: "int_val"
      02:53:15     type: INT32
      02:53:15     is_key: false
      02:53:15     is_nullable: true
      02:53:15     encoding: AUTO_ENCODING
      02:53:15     compression: DEFAULT_COMPRESSION
      02:53:15     cfile_block_size: 0
      02:53:15   }
      02:53:15   columns {
      02:53:15     name: "string_val"
      02:53:15     type: STRING
      02:53:15     is_key: false
      02:53:15     is_nullable: true
      02:53:15     encoding: AUTO_ENCODING
      02:53:15     compression: DEFAULT_COMPRESSION
      02:53:15     cfile_block_size: 0
      02:53:15   }
      02:53:15 }
      02:53:15 split_rows_range_bounds {
      02:53:15 }
      02:53:15 partition_schema {
      02:53:15   range_schema {
      02:53:15     columns {
      02:53:15       name: "key"
      02:53:15     }
      02:53:15   }
      02:53:15 }
      

      Here the CreateTable() call has arrived to the master before the tservers have finished initializing, and thus before they have heartbeated to the master.

      We should modify the start_cluster() call to wait until the tservers are initialized. Possibly this means adding --server_dump_info_path=... to the command line arguments and waiting for the file to show up. Or maybe that's insufficient and we should also query the master for the list of tservers. Either way, we need to wait a little longer in case the tservers are slow to start.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              adar Adar Dembo
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: