Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
0.19.0
-
None
Description
We have seen the following check failure in tests:
F0423 02:22:33.860647 58518 coordinator.cpp:399] Check failed: !missing Not expecting local replica to be missing position 4 after the writing is done
This is due to a bug in replicated log that the coordinator assume the local replica is always in the network (may not be true). Here is what happened:
1) Coordinator writes position 4
2) Write request is accepted and persisted
3) Coordinator tries to send learned message to the local replica
4) ZK down, network becomes empty, the local replica never receives the learned message
5) Coordinator thinks that the local replica should have learned position 4
The full log is in the following:
[ RUN ] MasterZooKeeperTest.LostZooKeeperCluster I0423 02:22:33.207391 58498 leveldb.cpp:174] Opened db in 128.980841ms I0423 02:22:33.226372 58498 leveldb.cpp:181] Compacted db in 18.88168ms I0423 02:22:33.226482 58498 leveldb.cpp:196] Created db iterator in 26045ns I0423 02:22:33.226506 58498 leveldb.cpp:202] Seeked to beginning of db in 1578ns I0423 02:22:33.226570 58498 leveldb.cpp:271] Iterated through 0 keys in the db in 327ns I0423 02:22:33.226619 58498 replica.cpp:729] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0423 02:22:33.265805 58515 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 38.364737ms I0423 02:22:33.265853 58515 replica.cpp:320] Persisted replica status to VOTING I0423 02:22:33.400118 58498 leveldb.cpp:174] Opened db in 133.685751ms I0423 02:22:33.476763 58498 leveldb.cpp:181] Compacted db in 76.545296ms I0423 02:22:33.476889 58498 leveldb.cpp:196] Created db iterator in 18192ns I0423 02:22:33.476927 58498 leveldb.cpp:202] Seeked to beginning of db in 13800ns I0423 02:22:33.477010 58498 leveldb.cpp:271] Iterated through 1 keys in the db in 24427ns I0423 02:22:33.477082 58498 replica.cpp:729] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned 2014-04-23 02:22:33,477:58498(0x7f0c0bc98940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-04-23 02:22:33,477:58498(0x7f0c0bc98940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com 2014-04-23 02:22:33,477:58498(0x7f0c0bc98940):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-04-23 02:22:33,477:58498(0x7f0c0bc98940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 2014-04-23 02:22:33,477:58498(0x7f0c0bc98940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 2014-04-23 02:22:33,477:58498(0x7f0c0bc98940):ZOO_INFO@log_env@733: Client environment:user.name=vinod 2014-04-23 02:22:33,477:58498(0x7f0c07c90940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-04-23 02:22:33,477:58498(0x7f0c0bc98940):ZOO_INFO@log_env@741: Client environment:user.home=/root I0423 02:22:33.477603 58529 log.cpp:226] Attempting to join replica to ZooKeeper group 2014-04-23 02:22:33,477:58498(0x7f0c07c90940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com 2014-04-23 02:22:33,477:58498(0x7f0c07c90940):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-04-23 02:22:33,477:58498(0x7f0c07c90940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 2014-04-23 02:22:33,477:58498(0x7f0c07c90940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 2014-04-23 02:22:33,477:58498(0x7f0c0bc98940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,477:58498(0x7f0c0bc98940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x7f0bbc033970 flags=0 2014-04-23 02:22:33,477:58498(0x7f0c07c90940):ZOO_INFO@log_env@733: Client environment:user.name=vinod 2014-04-23 02:22:33,477:58498(0x7f0c07c90940):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-04-23 02:22:33,477:58498(0x7f0c07c90940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,477:58498(0x7f0c07c90940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x1d01ad0 flags=0 I0423 02:22:33.477948 58524 recover.cpp:293] Starting replica recovery 2014-04-23 02:22:33,478:58498(0x7f0bc4a75940):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:52633] 2014-04-23 02:22:33,478:58498(0x7f0bc5478940):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:52633] 2014-04-23 02:22:33,478:58498(0x7f0c0bc98940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-04-23 02:22:33,478:58498(0x7f0c0bc98940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com 2014-04-23 02:22:33,478:58498(0x7f0c0bc98940):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-04-23 02:22:33,478:58498(0x7f0c0bc98940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 2014-04-23 02:22:33,478:58498(0x7f0c0bc98940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 2014-04-23 02:22:33,478:58498(0x7f0c0bc98940):ZOO_INFO@log_env@733: Client environment:user.name=vinod 2014-04-23 02:22:33,478:58498(0x7f0c0bc98940):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-04-23 02:22:33,478:58498(0x7f0c0bc98940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,478:58498(0x7f0c0b497940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-04-23 02:22:33,478:58498(0x7f0c0bc98940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x7f0bf80be8b0 flags=0 2014-04-23 02:22:33,478:58498(0x7f0c0b497940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com 2014-04-23 02:22:33,478:58498(0x7f0c0b497940):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-04-23 02:22:33,478:58498(0x7f0c0b497940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 2014-04-23 02:22:33,478:58498(0x7f0c0b497940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 2014-04-23 02:22:33,478:58498(0x7f0c0b497940):ZOO_INFO@log_env@733: Client environment:user.name=vinod I0423 02:22:33.479634 58533 recover.cpp:319] Replica is in VOTING status 2014-04-23 02:22:33,479:58498(0x7f0c0b497940):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-04-23 02:22:33,479:58498(0x7f0c0b497940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,479:58498(0x7f0c0b497940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x1d2b120 flags=0 I0423 02:22:33.480101 58533 recover.cpp:308] Recover process terminated 2014-04-23 02:22:33,480:58498(0x7f0bc3a73940):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:52633] 2014-04-23 02:22:33,480:58498(0x7f0bc2a71940):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:52633] I0423 02:22:33.481199 58519 master.cpp:264] Master 20140423-022233-1828659978-34577-58498 (smfd-atr-11-sr1.devel.twitter.com) started on 10.35.255.108:34577 I0423 02:22:33.481271 58519 master.cpp:298] Master only allowing authenticated frameworks to register! W0423 02:22:33.481387 58519 master.cpp:328] Permissions on credentials file '/tmp/MasterZooKeeperTest_LostZooKeeperCluster_KXTDKq/credentials' are too open. It is recommended that your credentials file is NOT accessible by others. I0423 02:22:33.487786 58532 contender.cpp:127] Joining the ZK group 2014-04-23 02:22:33,518:58498(0x7f0bc4a75940):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:52633], sessionId=0x1458c6446740000, negotiated timeout=10000 I0423 02:22:33.518537 58529 group.cpp:310] Group process ((100)@10.35.255.108:34577) connected to ZooKeeper I0423 02:22:33.518580 58529 group.cpp:781] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0423 02:22:33.518615 58529 group.cpp:382] Trying to create path '/znode/replicas' in ZooKeeper 2014-04-23 02:22:33,540:58498(0x7f0bc5478940):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:52633], sessionId=0x1458c6446740001, negotiated timeout=10000 I0423 02:22:33.540578 58526 group.cpp:310] Group process ((101)@10.35.255.108:34577) connected to ZooKeeper I0423 02:22:33.540634 58526 group.cpp:781] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0) I0423 02:22:33.540653 58526 group.cpp:382] Trying to create path '/znode/replicas' in ZooKeeper 2014-04-23 02:22:33,540:58498(0x7f0bc3a73940):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:52633], sessionId=0x1458c6446740002, negotiated timeout=10000 I0423 02:22:33.541086 58527 group.cpp:310] Group process ((104)@10.35.255.108:34577) connected to ZooKeeper I0423 02:22:33.541121 58527 group.cpp:781] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0) I0423 02:22:33.541143 58527 group.cpp:382] Trying to create path '/znode' in ZooKeeper 2014-04-23 02:22:33,541:58498(0x7f0bc2a71940):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:52633], sessionId=0x1458c6446740003, negotiated timeout=10000 I0423 02:22:33.541852 58519 group.cpp:310] Group process ((106)@10.35.255.108:34577) connected to ZooKeeper I0423 02:22:33.541923 58519 group.cpp:781] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0423 02:22:33.541944 58519 group.cpp:382] Trying to create path '/znode' in ZooKeeper I0423 02:22:33.565755 58514 contender.cpp:243] New candidate (id='0') has entered the contest for leadership I0423 02:22:33.574548 58515 detector.cpp:134] Detected a new leader: (id='0') I0423 02:22:33.574959 58527 group.cpp:655] Trying to get '/znode/info_0000000000' in ZooKeeper I0423 02:22:33.583420 58529 network.hpp:413] ZooKeeper group memberships changed I0423 02:22:33.583534 58528 detector.cpp:376] A new leading master (UPID=master@10.35.255.108:34577) is detected I0423 02:22:33.583587 58513 group.cpp:655] Trying to get '/znode/replicas/0000000000' in ZooKeeper I0423 02:22:33.584410 58520 master.cpp:940] The newly elected leader is master@10.35.255.108:34577 with id 20140423-022233-1828659978-34577-58498 I0423 02:22:33.584434 58520 master.cpp:950] Elected as the leading master! I0423 02:22:33.584473 58520 master.cpp:773] Recovering from registrar I0423 02:22:33.584812 58529 registrar.cpp:217] Recovering registrar I0423 02:22:33.585399 58514 network.hpp:451] ZooKeeper group PIDs: { log-replica(6)@10.35.255.108:34577 } I0423 02:22:33.586002 58524 log.cpp:639] Attempting to start the writer I0423 02:22:33.587383 58529 replica.cpp:474] Replica received implicit promise request with proposal 1 I0423 02:22:33.617635 58529 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 30.198423ms I0423 02:22:33.617684 58529 replica.cpp:342] Persisted promised to 1 I0423 02:22:33.618604 58520 coordinator.cpp:229] Coordinator attemping to fill missing position I0423 02:22:33.621388 58518 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 I0423 02:22:33.642787 58518 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 21.299332ms I0423 02:22:33.642882 58518 replica.cpp:664] Persisted action at 0 I0423 02:22:33.644448 58518 replica.cpp:508] Replica received write request for position 0 I0423 02:22:33.644603 58518 leveldb.cpp:436] Reading position from leveldb took 23568ns I0423 02:22:33.651110 58518 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 6.450127ms I0423 02:22:33.651197 58518 replica.cpp:664] Persisted action at 0 I0423 02:22:33.651711 58533 replica.cpp:643] Replica received learned notice for position 0 I0423 02:22:33.659396 58533 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 7.652094ms I0423 02:22:33.659435 58533 replica.cpp:664] Persisted action at 0 I0423 02:22:33.659462 58533 replica.cpp:649] Replica learned NOP action at position 0 I0423 02:22:33.660080 58523 log.cpp:655] Writer started with ending position 0 I0423 02:22:33.662197 58515 leveldb.cpp:436] Reading position from leveldb took 42564ns I0423 02:22:33.666170 58533 registrar.cpp:243] Successfully recovered registrar I0423 02:22:33.666329 58533 registrar.cpp:314] Attempting to update the 'registry' I0423 02:22:33.670188 58523 log.cpp:663] Attempting to append 155 bytes to the log I0423 02:22:33.670377 58520 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 1 I0423 02:22:33.672163 58522 replica.cpp:508] Replica received write request for position 1 I0423 02:22:33.676118 58522 leveldb.cpp:341] Persisting action (174 bytes) to leveldb took 3.910783ms I0423 02:22:33.676208 58522 replica.cpp:664] Persisted action at 1 I0423 02:22:33.677208 58533 replica.cpp:643] Replica received learned notice for position 1 I0423 02:22:33.684420 58533 leveldb.cpp:341] Persisting action (176 bytes) to leveldb took 7.123224ms I0423 02:22:33.684505 58533 replica.cpp:664] Persisted action at 1 I0423 02:22:33.684530 58533 replica.cpp:649] Replica learned APPEND action at position 1 I0423 02:22:33.685788 58533 registrar.cpp:356] Successfully updated 'registry' I0423 02:22:33.686305 58520 log.cpp:682] Attempting to truncate the log to 1 I0423 02:22:33.686362 58528 master.cpp:800] Recovered 0 slaves from the Registry (117B) ; allowing 10mins for slaves to re-register I0423 02:22:33.686494 58519 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 2 I0423 02:22:33.687501 58529 replica.cpp:508] Replica received write request for position 2 2014-04-23 02:22:33,688:58498(0x7f0c104a1940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-04-23 02:22:33,688:58498(0x7f0c104a1940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com 2014-04-23 02:22:33,688:58498(0x7f0c104a1940):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-04-23 02:22:33,688:58498(0x7f0c104a1940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 2014-04-23 02:22:33,688:58498(0x7f0c104a1940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 2014-04-23 02:22:33,688:58498(0x7f0c104a1940):ZOO_INFO@log_env@733: Client environment:user.name=vinod 2014-04-23 02:22:33,688:58498(0x7f0c104a1940):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-04-23 02:22:33,689:58498(0x7f0c104a1940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,689:58498(0x7f0c104a1940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x7f0c002804f0 flags=0 I0423 02:22:33.689520 58527 slave.cpp:129] Slave started on 1)@10.35.255.108:34577 2014-04-23 02:22:33,689:58498(0x7f0bc1a6f940):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:52633] I0423 02:22:33.689576 58527 slave.cpp:138] Moving slave process into its own cgroup 2014-04-23 02:22:33,690:58498(0x7f0c08491940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 I0423 02:22:33.690764 58498 sched.cpp:121] Version: 0.19.0 2014-04-23 02:22:33,690:58498(0x7f0c08491940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com 2014-04-23 02:22:33,690:58498(0x7f0c08491940):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-04-23 02:22:33,690:58498(0x7f0c08491940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 2014-04-23 02:22:33,690:58498(0x7f0c08491940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 2014-04-23 02:22:33,691:58498(0x7f0c08491940):ZOO_INFO@log_env@733: Client environment:user.name=vinod 2014-04-23 02:22:33,691:58498(0x7f0c08491940):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-04-23 02:22:33,691:58498(0x7f0c08491940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,691:58498(0x7f0c08491940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x1c9b7d0 flags=0 2014-04-23 02:22:33,691:58498(0x7f0bc0a6d940):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:52633] I0423 02:22:33.692736 58529 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 5.127297ms I0423 02:22:33.692821 58529 replica.cpp:664] Persisted action at 2 I0423 02:22:33.693465 58524 replica.cpp:643] Replica received learned notice for position 2 2014-04-23 02:22:33,698:58498(0x7f0bc1a6f940):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:52633], sessionId=0x1458c6446740004, negotiated timeout=10000 I0423 02:22:33.699019 58516 group.cpp:310] Group process ((116)@10.35.255.108:34577) connected to ZooKeeper I0423 02:22:33.699064 58516 group.cpp:781] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0423 02:22:33.699084 58516 group.cpp:382] Trying to create path '/znode' in ZooKeeper I0423 02:22:33.701099 58524 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 7.583491ms I0423 02:22:33.701223 58524 leveldb.cpp:399] Deleting ~1 keys from leveldb took 33262ns I0423 02:22:33.701306 58524 replica.cpp:664] Persisted action at 2 I0423 02:22:33.701359 58524 replica.cpp:649] Replica learned TRUNCATE action at position 2 2014-04-23 02:22:33,706:58498(0x7f0bc0a6d940):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:52633], sessionId=0x1458c6446740005, negotiated timeout=10000 I0423 02:22:33.707068 58522 group.cpp:310] Group process ((122)@10.35.255.108:34577) connected to ZooKeeper I0423 02:22:33.707278 58522 group.cpp:781] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0423 02:22:33.707310 58522 group.cpp:382] Trying to create path '/znode' in ZooKeeper I0423 02:22:33.709638 58516 detector.cpp:134] Detected a new leader: (id='0') I0423 02:22:33.709689 58529 detector.cpp:134] Detected a new leader: (id='0') I0423 02:22:33.709972 58534 group.cpp:655] Trying to get '/znode/info_0000000000' in ZooKeeper I0423 02:22:33.710321 58524 group.cpp:655] Trying to get '/znode/info_0000000000' in ZooKeeper I0423 02:22:33.712035 58531 detector.cpp:376] A new leading master (UPID=master@10.35.255.108:34577) is detected I0423 02:22:33.712049 58523 detector.cpp:376] A new leading master (UPID=master@10.35.255.108:34577) is detected I0423 02:22:33.712280 58517 sched.cpp:217] New master detected at master@10.35.255.108:34577 I0423 02:22:33.712394 58517 sched.cpp:268] Authenticating with master master@10.35.255.108:34577 I0423 02:22:33.712681 58532 authenticatee.hpp:104] Initializing client SASL I0423 02:22:33.714192 58532 authenticatee.hpp:128] Creating new client SASL connection I0423 02:22:33.714577 58533 master.cpp:2715] Authenticating framework at scheduler(1)@10.35.255.108:34577 I0423 02:22:33.715448 58535 authenticator.hpp:91] Initializing server SASL I0423 02:22:33.716045 58535 authenticator.hpp:148] Creating new server SASL connection I0423 02:22:33.716263 58535 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 I0423 02:22:33.716431 58535 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' I0423 02:22:33.716552 58526 authenticator.hpp:254] Received SASL authentication start I0423 02:22:33.716640 58526 authenticator.hpp:342] Authentication requires more steps I0423 02:22:33.716739 58516 authenticatee.hpp:265] Received SASL authentication step I0423 02:22:33.716943 58533 authenticator.hpp:282] Received SASL authentication step I0423 02:22:33.717118 58533 authenticator.hpp:334] Authentication success I0423 02:22:33.717270 58526 authenticatee.hpp:305] Authentication success I0423 02:22:33.717320 58533 master.cpp:2755] Successfully authenticated framework at scheduler(1)@10.35.255.108:34577 I0423 02:22:33.717632 58526 sched.cpp:342] Successfully authenticated with master master@10.35.255.108:34577 I0423 02:22:33.717903 58533 master.cpp:997] Received registration request from scheduler(1)@10.35.255.108:34577 I0423 02:22:33.718073 58533 master.cpp:1015] Registering framework 20140423-022233-1828659978-34577-58498-0000 at scheduler(1)@10.35.255.108:34577 I0423 02:22:33.718497 58530 sched.cpp:392] Framework registered with 20140423-022233-1828659978-34577-58498-0000 I0423 02:22:33.718652 58513 hierarchical_allocator_process.hpp:332] Added framework 20140423-022233-1828659978-34577-58498-0000 I0423 02:22:33.766744 58527 slave.cpp:138] Moving slave process into its own cgroup I0423 02:22:33.817608 58527 slave.cpp:217] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0423 02:22:33.818116 58527 slave.cpp:245] Slave hostname: smfd-atr-11-sr1.devel.twitter.com I0423 02:22:33.818147 58527 slave.cpp:246] Slave checkpoint: false I0423 02:22:33.819965 58526 state.cpp:33] Recovering state from '/tmp/MasterZooKeeperTest_LostZooKeeperCluster_AwmIH4/meta' I0423 02:22:33.820495 58533 status_update_manager.cpp:193] Recovering status update manager I0423 02:22:33.821004 58533 slave.cpp:2791] Finished recovery I0423 02:22:33.821710 58521 slave.cpp:496] New master detected at master@10.35.255.108:34577 I0423 02:22:33.821869 58533 status_update_manager.cpp:167] New master detected at master@10.35.255.108:34577 I0423 02:22:33.822451 58521 slave.cpp:521] Detecting new master I0423 02:22:33.823314 58514 registrar.cpp:314] Attempting to update the 'registry' I0423 02:22:33.826674 58524 log.cpp:663] Attempting to append 382 bytes to the log I0423 02:22:33.826882 58524 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 3 I0423 02:22:33.828371 58514 replica.cpp:508] Replica received write request for position 3 I0423 02:22:33.842730 58514 leveldb.cpp:341] Persisting action (401 bytes) to leveldb took 14.324743ms I0423 02:22:33.842820 58514 replica.cpp:664] Persisted action at 3 I0423 02:22:33.843473 58531 replica.cpp:643] Replica received learned notice for position 3 I0423 02:22:33.851027 58531 leveldb.cpp:341] Persisting action (403 bytes) to leveldb took 7.469229ms I0423 02:22:33.851112 58531 replica.cpp:664] Persisted action at 3 I0423 02:22:33.851140 58531 replica.cpp:649] Replica learned APPEND action at position 3 I0423 02:22:33.852147 58517 registrar.cpp:356] Successfully updated 'registry' I0423 02:22:33.852347 58530 log.cpp:682] Attempting to truncate the log to 3 I0423 02:22:33.852566 58522 master.cpp:2150] Admitted slave on smfd-atr-11-sr1.devel.twitter.com at slave(1)@10.35.255.108:34577 I0423 02:22:33.852670 58522 master.cpp:3204] Adding slave 20140423-022233-1828659978-34577-58498-0 at smfd-atr-11-sr1.devel.twitter.com with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] I0423 02:22:33.852726 58533 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 4 I0423 02:22:33.853090 58517 slave.cpp:539] Registered with master master@10.35.255.108:34577; given slave ID 20140423-022233-1828659978-34577-58498-0 I0423 02:22:33.854205 58515 hierarchical_allocator_process.hpp:445] Added slave 20140423-022233-1828659978-34577-58498-0 (smfd-atr-11-sr1.devel.twitter.com) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) I0423 02:22:33.854622 58518 replica.cpp:508] Replica received write request for position 4 2014-04-23 02:22:33,855:58498(0x7f0bc3a73940):ZOO_ERROR@handle_socket_error_msg@1721: Socket [127.0.0.1:52633] zk retcode=-4, errno=112(Host is down): failed while receiving a server response I0423 02:22:33.855417 58515 master.cpp:2676] Sending 1 offers to framework 20140423-022233-1828659978-34577-58498-0000 2014-04-23 02:22:33,855:58498(0x7f0bc5478940):ZOO_ERROR@handle_socket_error_msg@1721: Socket [127.0.0.1:52633] zk retcode=-4, errno=112(Host is down): failed while receiving a server response I0423 02:22:33.855680 58517 group.cpp:415] Lost connection to ZooKeeper, attempting to reconnect ... 2014-04-23 02:22:33,855:58498(0x7f0bc4a75940):ZOO_ERROR@handle_socket_error_msg@1721: Socket [127.0.0.1:52633] zk retcode=-4, errno=112(Host is down): failed while receiving a server response 2014-04-23 02:22:33,856:58498(0x7f0bc1a6f940):ZOO_ERROR@handle_socket_error_msg@1721: Socket [127.0.0.1:52633] zk retcode=-4, errno=112(Host is down): failed while receiving a server response 2014-04-23 02:22:33,856:58498(0x7f0bc2a71940):ZOO_ERROR@handle_socket_error_msg@1721: Socket [127.0.0.1:52633] zk retcode=-4, errno=112(Host is down): failed while receiving a server response I0423 02:22:33.856515 58526 group.cpp:415] Lost connection to ZooKeeper, attempting to reconnect ... I0423 02:22:33.856559 58525 group.cpp:415] Lost connection to ZooKeeper, attempting to reconnect ... I0423 02:22:33.856650 58530 group.cpp:415] Lost connection to ZooKeeper, attempting to reconnect ... 2014-04-23 02:22:33,856:58498(0x7f0bc0a6d940):ZOO_ERROR@handle_socket_error_msg@1721: Socket [127.0.0.1:52633] zk retcode=-4, errno=112(Host is down): failed while receiving a server response I0423 02:22:33.856796 58535 group.cpp:415] Lost connection to ZooKeeper, attempting to reconnect ... I0423 02:22:33.856910 58529 group.cpp:415] Lost connection to ZooKeeper, attempting to reconnect ... I0423 02:22:33.857614 58498 zookeeper_test_server.cpp:122] Shutdown ZooKeeperTestServer on port 52633 W0423 02:22:33.858124 58532 group.cpp:453] Timed out waiting to reconnect to ZooKeeper. Forcing ZooKeeper session (sessionId=1458c6446740001) expiration W0423 02:22:33.858194 58521 group.cpp:453] Timed out waiting to reconnect to ZooKeeper. Forcing ZooKeeper session (sessionId=1458c6446740004) expiration I0423 02:22:33.858209 58532 group.cpp:469] ZooKeeper session expired I0423 02:22:33.858232 58521 group.cpp:469] ZooKeeper session expired W0423 02:22:33.858266 58535 group.cpp:453] Timed out waiting to reconnect to ZooKeeper. Forcing ZooKeeper session (sessionId=1458c6446740002) expiration W0423 02:22:33.858356 58526 group.cpp:453] Timed out waiting to reconnect to ZooKeeper. Forcing ZooKeeper session (sessionId=1458c6446740000) expiration I0423 02:22:33.858410 58535 group.cpp:469] ZooKeeper session expired W0423 02:22:33.858453 58525 group.cpp:453] Timed out waiting to reconnect to ZooKeeper. Forcing ZooKeeper session (sessionId=1458c6446740005) expiration W0423 02:22:33.858391 58531 group.cpp:453] Timed out waiting to reconnect to ZooKeeper. Forcing ZooKeeper session (sessionId=1458c6446740003) expiration I0423 02:22:33.858507 58526 group.cpp:469] ZooKeeper session expired I0423 02:22:33.858486 58525 group.cpp:469] ZooKeeper session expired I0423 02:22:33.858474 58533 log.cpp:360] Renewing replica group membership 2014-04-23 02:22:33,858:58498(0x7f0c0f49f940):ZOO_INFO@zookeeper_close@2522: Freeing zookeeper resources for sessionId=0x1458c6446740004 I0423 02:22:33.858518 58531 group.cpp:469] ZooKeeper session expired 2014-04-23 02:22:33,858:58498(0x7f0c0f49f940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 I0423 02:22:33.858690 58513 detector.cpp:134] Detected a new leader: None 2014-04-23 02:22:33,858:58498(0x7f0c0f49f940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com I0423 02:22:33.858649 58520 network.hpp:413] ZooKeeper group memberships changed 2014-04-23 02:22:33,858:58498(0x7f0c0f49f940):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-04-23 02:22:33,858:58498(0x7f0c0f49f940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 I0423 02:22:33.858753 58519 contender.cpp:192] Membership cancelled: 0 2014-04-23 02:22:33,858:58498(0x7f0c0f49f940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 I0423 02:22:33.858796 58517 detector.cpp:134] Detected a new leader: None 2014-04-23 02:22:33,858:58498(0x7f0c0f49f940):ZOO_INFO@log_env@733: Client environment:user.name=vinod I0423 02:22:33.858533 58516 detector.cpp:134] Detected a new leader: None 2014-04-23 02:22:33,859:58498(0x7f0c09c94940):ZOO_INFO@zookeeper_close@2522: Freeing zookeeper resources for sessionId=0x1458c6446740001 I0423 02:22:33.859089 58529 sched.cpp:233] No master detected 2014-04-23 02:22:33,859:58498(0x7f0c08491940):ZOO_INFO@zookeeper_close@2522: Freeing zookeeper resources for sessionId=0x1458c6446740002 2014-04-23 02:22:33,859:58498(0x7f0c09c94940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-04-23 02:22:33,859:58498(0x7f0c0f49f940):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-04-23 02:22:33,859:58498(0x7f0c09c94940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com 2014-04-23 02:22:33,859:58498(0x7f0c09c94940):ZOO_INFO@log_env@723: Client environment:os.name=Linux I0423 02:22:33.859365 58527 slave.cpp:517] Lost leading master 2014-04-23 02:22:33,859:58498(0x7f0c08491940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-04-23 02:22:33,859:58498(0x7f0c0cc9a940):ZOO_INFO@zookeeper_close@2522: Freeing zookeeper resources for sessionId=0x1458c6446740000 2014-04-23 02:22:33,859:58498(0x7f0c09c94940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 I0423 02:22:33.859379 58518 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 4.705146ms 2014-04-23 02:22:33,859:58498(0x7f0c09c94940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 2014-04-23 02:22:33,859:58498(0x7f0c0f49f940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,859:58498(0x7f0c0d49b940):ZOO_INFO@zookeeper_close@2522: Freeing zookeeper resources for sessionId=0x1458c6446740005 I0423 02:22:33.859505 58527 slave.cpp:521] Detecting new master 2014-04-23 02:22:33,859:58498(0x7f0c0a495940):ZOO_INFO@zookeeper_close@2522: Freeing zookeeper resources for sessionId=0x1458c6446740003 2014-04-23 02:22:33,859:58498(0x7f0c0cc9a940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 I0423 02:22:33.859549 58520 network.hpp:451] ZooKeeper group PIDs: { } I0423 02:22:33.859591 58518 replica.cpp:664] Persisted action at 4 2014-04-23 02:22:33,859:58498(0x7f0c09c94940):ZOO_INFO@log_env@733: Client environment:user.name=vinod 2014-04-23 02:22:33,859:58498(0x7f0c08491940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com 2014-04-23 02:22:33,859:58498(0x7f0c0d49b940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-04-23 02:22:33,859:58498(0x7f0c08491940):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-04-23 02:22:33,859:58498(0x7f0c0a495940):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-04-23 02:22:33,859:58498(0x7f0c08491940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 2014-04-23 02:22:33,859:58498(0x7f0c0f49f940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x1cc2d40 flags=0 2014-04-23 02:22:33,859:58498(0x7f0c08491940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 2014-04-23 02:22:33,859:58498(0x7f0c09c94940):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-04-23 02:22:33,859:58498(0x7f0c0cc9a940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com 2014-04-23 02:22:33,859:58498(0x7f0c0d49b940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com 2014-04-23 02:22:33,859:58498(0x7f0c0cc9a940):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-04-23 02:22:33,859:58498(0x7f0c0d49b940):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-04-23 02:22:33,859:58498(0x7f0c0d49b940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 2014-04-23 02:22:33,859:58498(0x7f0c0a495940):ZOO_INFO@log_env@716: Client environment:host.name=smfd-atr-11-sr1.devel.twitter.com 2014-04-23 02:22:33,859:58498(0x7f0c0d49b940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 2014-04-23 02:22:33,859:58498(0x7f0c0a495940):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-04-23 02:22:33,859:58498(0x7f0c09c94940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,859:58498(0x7f0c08491940):ZOO_INFO@log_env@733: Client environment:user.name=vinod 2014-04-23 02:22:33,859:58498(0x7f0c0cc9a940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 2014-04-23 02:22:33,859:58498(0x7f0c0a495940):ZOO_INFO@log_env@724: Client environment:os.arch=2.6.44-t14.el5 2014-04-23 02:22:33,860:58498(0x7f0bc2270940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:52633] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-04-23 02:22:33,860:58498(0x7f0c0a495940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 2014-04-23 02:22:33,860:58498(0x7f0c0cc9a940):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Thu Dec 19 12:29:49 PST 2013 2014-04-23 02:22:33,859:58498(0x7f0c0d49b940):ZOO_INFO@log_env@733: Client environment:user.name=vinod 2014-04-23 02:22:33,860:58498(0x7f0c0cc9a940):ZOO_INFO@log_env@733: Client environment:user.name=vinod 2014-04-23 02:22:33,860:58498(0x7f0c09c94940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x7f0bbc05b7d0 flags=0 2014-04-23 02:22:33,860:58498(0x7f0c0a495940):ZOO_INFO@log_env@733: Client environment:user.name=vinod 2014-04-23 02:22:33,860:58498(0x7f0c08491940):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-04-23 02:22:33,860:58498(0x7f0c0d49b940):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-04-23 02:22:33,860:58498(0x7f0c08491940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,860:58498(0x7f0c0a495940):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-04-23 02:22:33,860:58498(0x7f0c0d49b940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,860:58498(0x7f0c0cc9a940):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-04-23 02:22:33,860:58498(0x7f0c0a495940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,860:58498(0x7f0c0d49b940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x7f0bf80080a0 flags=0 2014-04-23 02:22:33,860:58498(0x7f0c0cc9a940):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vinod/mesos/build 2014-04-23 02:22:33,860:58498(0x7f0c0a495940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x1d2b120 flags=0 2014-04-23 02:22:33,860:58498(0x7f0c08491940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x7f0c0031cda0 flags=0 2014-04-23 02:22:33,860:58498(0x7f0bc4274940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:52633] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-04-23 02:22:33,860:58498(0x7f0c0cc9a940):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=127.0.0.1:52633 sessionTimeout=10000 watcher=0x7f0c18b10d5f sessionId=0 sessionPasswd=<null> context=0x7f0c0027bdf0 flags=0 F0423 02:22:33.860647 58518 coordinator.cpp:399] Check failed: !missing Not expecting local replica to be missing position 4 after the writing is done 2014-04-23 02:22:33,860:58498(0x7f0bc647a940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:52633] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-04-23 02:22:33,860:58498(0x7f0bc4c77940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:52633] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-04-23 02:22:33,860:58498(0x7f0bc1a6f940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:52633] zk retcode=-4, errno=111(Connection refused): server refused to accept the client 2014-04-23 02:22:33,860:58498(0x7f0bc3272940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:52633] zk retcode=-4, errno=111(Connection refused): server refused to accept the client *** Check failure stack trace: *** @ 0x7f0c18f91754 google::LogMessage::Fail() @ 0x7f0c18f916a3 google::LogMessage::SendToLog() @ 0x7f0c18f910b4 google::LogMessage::Flush() @ 0x7f0c18f93de8 google::LogMessageFatal::~LogMessageFatal() @ 0x7f0c18cbc0c5 mesos::internal::log::CoordinatorProcess::updateIndexAfterWritten() @ 0x7f0c18cc4bf7 _ZZN7process8dispatchI6OptionImEN5mesos8internal3log18CoordinatorProcessEbbEENS_6FutureIT_EERKNS_3PIDIT0_EEMSB_FS9_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESK_ @ 0x7f0c18cd2227 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI6OptionImEN5mesos8internal3log18CoordinatorProcessEbbEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSF_FSD_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ @ 0x7f0c18e58b4d std::function<>::operator()() @ 0x7f0c18e38295 process::ProcessBase::visit() @ 0x7f0c18e42898 process::DispatchEvent::visit() @ 0x7ba322 process::ProcessBase::serve() @ 0x7f0c18e345a6 process::ProcessManager::resume() @ 0x7f0c18e2b36b process::schedule() @ 0x7f0c1699883d start_thread @ 0x7f0c156ff26d clone /var/tmp/sclUrjwpu: line 8: 58498 Aborted './bin/mesos-tests.sh' '--gtest_filter=*ZooKeeper*' '--verbose'