Description
Observed this in internal CI
DEBUG: [ RUN ] SlaveRecoveryTest/0.RecoverCompletedExecutor DEBUG: Using temporary directory '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B' DEBUG: I1020 08:56:36.634321 28115 leveldb.cpp:176] Opened db in 185.662339ms DEBUG: I1020 08:56:36.701638 28115 leveldb.cpp:183] Compacted db in 67.257643ms DEBUG: I1020 08:56:36.701705 28115 leveldb.cpp:198] Created db iterator in 8212ns DEBUG: I1020 08:56:36.701719 28115 leveldb.cpp:204] Seeked to beginning of db in 1417ns DEBUG: I1020 08:56:36.701730 28115 leveldb.cpp:273] Iterated through 0 keys in the db in 357ns DEBUG: I1020 08:56:36.701756 28115 replica.cpp:746] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned DEBUG: I1020 08:56:36.702062 28132 recover.cpp:449] Starting replica recovery DEBUG: I1020 08:56:36.702116 28132 recover.cpp:475] Replica is in EMPTY status DEBUG: I1020 08:56:36.702952 28132 replica.cpp:642] Replica in EMPTY status received a broadcasted recover request from (7143)@172.16.132.117:37586 DEBUG: I1020 08:56:36.703795 28141 recover.cpp:195] Received a recover response from a replica in EMPTY status DEBUG: I1020 08:56:36.704100 28138 recover.cpp:566] Updating replica status to STARTING DEBUG: I1020 08:56:36.705229 28133 master.cpp:376] Master 0d54e2f1-43d7-4f74-8532-9c37ac40791b (smfc-ahy-19-sr2.corpdc.twitter.com) started on 172.16.132.117:37586 DEBUG: I1020 08:56:36.705289 28133 master.cpp:378] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/master" --zk_session_timeout="10secs" DEBUG: I1020 08:56:36.705440 28133 master.cpp:423] Master only allowing authenticated frameworks to register DEBUG: I1020 08:56:36.705446 28133 master.cpp:428] Master only allowing authenticated slaves to register DEBUG: I1020 08:56:36.705451 28133 credentials.hpp:37] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_rTtR9B/credentials' DEBUG: I1020 08:56:36.705587 28133 master.cpp:467] Using default 'crammd5' authenticator DEBUG: I1020 08:56:36.705651 28133 master.cpp:504] Authorization enabled DEBUG: I1020 08:56:36.706521 28134 master.cpp:1609] The newly elected leader is master@172.16.132.117:37586 with id 0d54e2f1-43d7-4f74-8532-9c37ac40791b DEBUG: I1020 08:56:36.706539 28134 master.cpp:1622] Elected as the leading master! DEBUG: I1020 08:56:36.706545 28134 master.cpp:1382] Recovering from registrar DEBUG: I1020 08:56:36.706681 28146 registrar.cpp:309] Recovering registrar DEBUG: I1020 08:56:36.768453 28138 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 64.300669ms DEBUG: I1020 08:56:36.768492 28138 replica.cpp:323] Persisted replica status to STARTING DEBUG: I1020 08:56:36.768568 28138 recover.cpp:475] Replica is in STARTING status DEBUG: I1020 08:56:36.769737 28131 replica.cpp:642] Replica in STARTING status received a broadcasted recover request from (7144)@172.16.132.117:37586 DEBUG: I1020 08:56:36.769816 28131 recover.cpp:195] Received a recover response from a replica in STARTING status DEBUG: I1020 08:56:36.770355 28141 recover.cpp:566] Updating replica status to VOTING DEBUG: I1020 08:56:36.818709 28136 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 48.054479ms DEBUG: I1020 08:56:36.818743 28136 replica.cpp:323] Persisted replica status to VOTING DEBUG: I1020 08:56:36.818791 28136 recover.cpp:580] Successfully joined the Paxos group DEBUG: I1020 08:56:36.818842 28136 recover.cpp:464] Recover process terminated DEBUG: I1020 08:56:36.818954 28130 log.cpp:661] Attempting to start the writer DEBUG: I1020 08:56:36.820060 28140 replica.cpp:478] Replica received implicit promise request from (7145)@172.16.132.117:37586 with proposal 1 DEBUG: I1020 08:56:36.885488 28140 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 65.391754ms DEBUG: I1020 08:56:36.885524 28140 replica.cpp:345] Persisted promised to 1 DEBUG: I1020 08:56:36.885898 28148 coordinator.cpp:231] Coordinator attemping to fill missing position DEBUG: I1020 08:56:36.888197 28143 replica.cpp:378] Replica received explicit promise request from (7146)@172.16.132.117:37586 for position 0 with proposal 2 DEBUG: I1020 08:56:36.952364 28143 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 64.11496ms DEBUG: I1020 08:56:36.952407 28143 replica.cpp:681] Persisted action at 0 DEBUG: I1020 08:56:36.953027 28139 replica.cpp:512] Replica received write request for position 0 from (7147)@172.16.132.117:37586 DEBUG: I1020 08:56:36.953059 28139 leveldb.cpp:438] Reading position from leveldb took 14203ns DEBUG: I1020 08:56:36.994161 28139 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 41.082908ms DEBUG: I1020 08:56:36.994199 28139 replica.cpp:681] Persisted action at 0 DEBUG: I1020 08:56:36.994415 28143 replica.cpp:660] Replica received learned notice for position 0 from @0.0.0.0:0 DEBUG: I1020 08:56:37.036006 28143 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 41.569884ms DEBUG: I1020 08:56:37.036043 28143 replica.cpp:681] Persisted action at 0 DEBUG: I1020 08:56:37.036056 28143 replica.cpp:666] Replica learned NOP action at position 0 DEBUG: I1020 08:56:37.036365 28138 log.cpp:677] Writer started with ending position 0 DEBUG: I1020 08:56:37.037250 28133 leveldb.cpp:438] Reading position from leveldb took 45165ns DEBUG: I1020 08:56:37.038321 28131 registrar.cpp:342] Successfully fetched the registry (0B) in 331.61216ms DEBUG: I1020 08:56:37.038378 28131 registrar.cpp:441] Applied 1 operations in 6704ns; attempting to update the 'registry' DEBUG: I1020 08:56:37.038996 28132 log.cpp:685] Attempting to append 228 bytes to the log DEBUG: I1020 08:56:37.039103 28132 coordinator.cpp:341] Coordinator attempting to write APPEND action at position 1 DEBUG: I1020 08:56:37.039717 28152 replica.cpp:512] Replica received write request for position 1 from (7148)@172.16.132.117:37586 DEBUG: I1020 08:56:37.111176 28152 leveldb.cpp:343] Persisting action (247 bytes) to leveldb took 71.428059ms DEBUG: I1020 08:56:37.111225 28152 replica.cpp:681] Persisted action at 1 DEBUG: I1020 08:56:37.111547 28138 replica.cpp:660] Replica received learned notice for position 1 from @0.0.0.0:0 DEBUG: I1020 08:56:37.177995 28138 leveldb.cpp:343] Persisting action (249 bytes) to leveldb took 66.426857ms DEBUG: I1020 08:56:37.178033 28138 replica.cpp:681] Persisted action at 1 DEBUG: I1020 08:56:37.178045 28138 replica.cpp:666] Replica learned APPEND action at position 1 DEBUG: I1020 08:56:37.178407 28137 registrar.cpp:486] Successfully updated the 'registry' in 139.99104ms DEBUG: I1020 08:56:37.178473 28137 registrar.cpp:372] Successfully recovered registrar DEBUG: I1020 08:56:37.178496 28146 log.cpp:704] Attempting to truncate the log to 1 DEBUG: I1020 08:56:37.178966 28145 master.cpp:1419] Recovered 0 slaves from the Registry (189B) ; allowing 10mins for slaves to re-register DEBUG: I1020 08:56:37.178951 28140 coordinator.cpp:341] Coordinator attempting to write TRUNCATE action at position 2 DEBUG: I1020 08:56:37.180043 28132 replica.cpp:512] Replica received write request for position 2 from (7149)@172.16.132.117:37586 DEBUG: I1020 08:56:37.219796 28132 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 39.72368ms DEBUG: I1020 08:56:37.219825 28132 replica.cpp:681] Persisted action at 2 DEBUG: I1020 08:56:37.219990 28132 replica.cpp:660] Replica received learned notice for position 2 from @0.0.0.0:0 DEBUG: I1020 08:56:37.278291 28132 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 58.280037ms DEBUG: I1020 08:56:37.278342 28132 leveldb.cpp:401] Deleting ~1 keys from leveldb took 21056ns DEBUG: I1020 08:56:37.278355 28132 replica.cpp:681] Persisted action at 2 DEBUG: I1020 08:56:37.278363 28132 replica.cpp:666] Replica learned TRUNCATE action at position 2 DEBUG: I1020 08:56:37.281244 28115 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix DEBUG: W1020 08:56:37.281390 28115 backend.cpp:50] Failed to create 'bind' backend: BindBackend requires root privileges DEBUG: I1020 08:56:37.284584 28153 slave.cpp:191] Slave started on 255)@172.16.132.117:37586 DEBUG: I1020 08:56:37.284600 28153 slave.cpp:192] Flags at startup: --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_local_archives_dir="/tmp/mesos/images/docker" --docker_puller="local" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --egress_unique_flow_per_container="false" --enforce_container_disk_quota="false" --ephemeral_ports_per_container="1024" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/builddir/build/BUILD/mesos-0.26.0/src" --logbufsecs="0" --logging_level="INFO" --network_enable_socket_statistics_details="false" --network_enable_socket_statistics_summary="false" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx" DEBUG: I1020 08:56:37.284816 28153 credentials.hpp:85] Loading credential for authentication from '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/credential' DEBUG: I1020 08:56:37.284910 28153 slave.cpp:322] Slave using credential for: test-principal DEBUG: I1020 08:56:37.285879 28153 slave.cpp:392] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] DEBUG: I1020 08:56:37.285928 28153 slave.cpp:400] Slave attributes: [ ] DEBUG: I1020 08:56:37.285943 28153 slave.cpp:405] Slave hostname: smfc-ahy-19-sr2.corpdc.twitter.com DEBUG: I1020 08:56:37.285955 28153 slave.cpp:410] Slave checkpoint: true DEBUG: I1020 08:56:37.286746 28115 sched.cpp:166] Version: 0.26.0-rc0 DEBUG: I1020 08:56:37.286973 28151 state.cpp:54] Recovering state from '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta' DEBUG: I1020 08:56:37.287063 28143 status_update_manager.cpp:202] Recovering status update manager DEBUG: I1020 08:56:37.287384 28137 containerizer.cpp:385] Recovering containerizer DEBUG: I1020 08:56:37.287608 28139 sched.cpp:264] New master detected at master@172.16.132.117:37586 DEBUG: I1020 08:56:37.287746 28139 sched.cpp:320] Authenticating with master master@172.16.132.117:37586 DEBUG: I1020 08:56:37.287757 28139 sched.cpp:327] Using default CRAM-MD5 authenticatee DEBUG: I1020 08:56:37.287824 28135 slave.cpp:4222] Finished recovery DEBUG: I1020 08:56:37.287967 28149 authenticatee.cpp:115] Creating new client SASL connection DEBUG: I1020 08:56:37.288192 28139 master.cpp:5144] Authenticating scheduler-560af120-1924-4956-b664-3c4a67d0fe93@172.16.132.117:37586 DEBUG: I1020 08:56:37.288382 28136 status_update_manager.cpp:176] Pausing sending status updates DEBUG: I1020 08:56:37.288444 28140 slave.cpp:726] New master detected at master@172.16.132.117:37586 DEBUG: I1020 08:56:37.288467 28140 slave.cpp:789] Authenticating with master master@172.16.132.117:37586 DEBUG: I1020 08:56:37.288475 28140 slave.cpp:794] Using default CRAM-MD5 authenticatee DEBUG: I1020 08:56:37.288508 28140 slave.cpp:762] Detecting new master DEBUG: I1020 08:56:37.288604 28130 authenticatee.cpp:115] Creating new client SASL connection DEBUG: I1020 08:56:37.288717 28140 authenticator.cpp:92] Creating new server SASL connection DEBUG: I1020 08:56:37.288785 28139 master.cpp:5144] Authenticating slave(255)@172.16.132.117:37586 DEBUG: I1020 08:56:37.288861 28135 authenticatee.cpp:206] Received SASL authentication mechanisms: CRAM-MD5 DEBUG: I1020 08:56:37.288882 28135 authenticatee.cpp:232] Attempting to authenticate with mechanism 'CRAM-MD5' DEBUG: I1020 08:56:37.289080 28133 authenticator.cpp:92] Creating new server SASL connection DEBUG: I1020 08:56:37.289175 28131 authenticatee.cpp:206] Received SASL authentication mechanisms: CRAM-MD5 DEBUG: I1020 08:56:37.289199 28131 authenticatee.cpp:232] Attempting to authenticate with mechanism 'CRAM-MD5' DEBUG: I1020 08:56:37.289355 28144 authenticator.cpp:197] Received SASL authentication start DEBUG: I1020 08:56:37.289371 28140 authenticator.cpp:197] Received SASL authentication start DEBUG: I1020 08:56:37.289440 28144 authenticator.cpp:319] Authentication requires more steps DEBUG: I1020 08:56:37.289459 28140 authenticator.cpp:319] Authentication requires more steps DEBUG: I1020 08:56:37.289480 28144 authenticatee.cpp:252] Received SASL authentication step DEBUG: I1020 08:56:37.289489 28140 authenticatee.cpp:252] Received SASL authentication step DEBUG: I1020 08:56:37.289813 28132 authenticator.cpp:225] Received SASL authentication step DEBUG: I1020 08:56:37.289852 28132 authenticator.cpp:311] Authentication success DEBUG: I1020 08:56:37.289898 28143 authenticatee.cpp:292] Authentication success DEBUG: I1020 08:56:37.289971 28142 master.cpp:5174] Successfully authenticated principal 'test-principal' at scheduler-560af120-1924-4956-b664-3c4a67d0fe93@172.16.132.117:37586 DEBUG: I1020 08:56:37.290009 28143 sched.cpp:409] Successfully authenticated with master master@172.16.132.117:37586 DEBUG: I1020 08:56:37.290163 28137 authenticator.cpp:225] Received SASL authentication step DEBUG: I1020 08:56:37.290220 28137 authenticator.cpp:311] Authentication success DEBUG: I1020 08:56:37.290266 28134 master.cpp:2185] Received SUBSCRIBE call for framework 'default' at scheduler-560af120-1924-4956-b664-3c4a67d0fe93@172.16.132.117:37586 DEBUG: I1020 08:56:37.290289 28134 master.cpp:1648] Authorizing framework principal 'test-principal' to receive offers for role '*' DEBUG: I1020 08:56:37.290405 28153 authenticatee.cpp:292] Authentication success DEBUG: I1020 08:56:37.290431 28144 master.cpp:5174] Successfully authenticated principal 'test-principal' at slave(255)@172.16.132.117:37586 DEBUG: I1020 08:56:37.290464 28144 master.cpp:2256] Subscribing framework default with checkpointing enabled and capabilities [ ] DEBUG: I1020 08:56:37.290654 28136 hierarchical.cpp:185] Added framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.290650 28148 slave.cpp:857] Successfully authenticated with master master@172.16.132.117:37586 DEBUG: I1020 08:56:37.290817 28144 master.cpp:3868] Registering slave at slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) with id 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 DEBUG: I1020 08:56:37.290873 28151 sched.cpp:643] Framework registered with 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.290930 28130 registrar.cpp:441] Applied 1 operations in 16086ns; attempting to update the 'registry' DEBUG: I1020 08:56:37.291250 28151 log.cpp:685] Attempting to append 419 bytes to the log DEBUG: I1020 08:56:37.291321 28139 coordinator.cpp:341] Coordinator attempting to write APPEND action at position 3 DEBUG: I1020 08:56:37.291939 28131 replica.cpp:512] Replica received write request for position 3 from (7161)@172.16.132.117:37586 DEBUG: I1020 08:56:37.300041 28136 master.cpp:3856] Ignoring register slave message from slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) as admission is already in progress DEBUG: I1020 08:56:37.311420 28141 master.cpp:3856] Ignoring register slave message from slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) as admission is already in progress DEBUG: I1020 08:56:37.341991 28130 master.cpp:3856] Ignoring register slave message from slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) as admission is already in progress DEBUG: I1020 08:56:37.354004 28131 leveldb.cpp:343] Persisting action (438 bytes) to leveldb took 62.03948ms DEBUG: I1020 08:56:37.354042 28131 replica.cpp:681] Persisted action at 3 DEBUG: I1020 08:56:37.354390 28136 replica.cpp:660] Replica received learned notice for position 3 from @0.0.0.0:0 DEBUG: I1020 08:56:37.378435 28130 master.cpp:3856] Ignoring register slave message from slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) as admission is already in progress DEBUG: I1020 08:56:37.395829 28136 leveldb.cpp:343] Persisting action (440 bytes) to leveldb took 41.417781ms DEBUG: I1020 08:56:37.395867 28136 replica.cpp:681] Persisted action at 3 DEBUG: I1020 08:56:37.395879 28136 replica.cpp:666] Replica learned APPEND action at position 3 DEBUG: I1020 08:56:37.396229 28136 registrar.cpp:486] Successfully updated the 'registry' in 105.257984ms DEBUG: I1020 08:56:37.396466 28148 log.cpp:704] Attempting to truncate the log to 3 DEBUG: I1020 08:56:37.396519 28151 coordinator.cpp:341] Coordinator attempting to write TRUNCATE action at position 4 DEBUG: I1020 08:56:37.396827 28136 master.cpp:3936] Registered slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] DEBUG: I1020 08:56:37.396867 28137 slave.cpp:901] Registered with master master@172.16.132.117:37586; given slave ID 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 DEBUG: I1020 08:56:37.396874 28152 hierarchical.cpp:335] Added slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 (smfc-ahy-19-sr2.corpdc.twitter.com) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: ) DEBUG: I1020 08:56:37.397186 28150 master.cpp:4973] Sending 1 offers to framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at scheduler-560af120-1924-4956-b664-3c4a67d0fe93@172.16.132.117:37586 DEBUG: I1020 08:56:37.397260 28132 status_update_manager.cpp:183] Resuming sending status updates DEBUG: I1020 08:56:37.397285 28136 replica.cpp:512] Replica received write request for position 4 from (7162)@172.16.132.117:37586 DEBUG: I1020 08:56:37.397516 28137 slave.cpp:960] Forwarding total oversubscribed resources DEBUG: I1020 08:56:37.397769 28150 master.cpp:4278] Received update of slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) with total oversubscribed resources DEBUG: I1020 08:56:37.397902 28149 hierarchical.cpp:391] Slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 (smfc-ahy-19-sr2.corpdc.twitter.com) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) DEBUG: I1020 08:56:37.399044 28152 master.cpp:2924] Processing ACCEPT call for offers: [ 0d54e2f1-43d7-4f74-8532-9c37ac40791b-O0 ] on slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) for framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at scheduler-560af120-1924-4956-b664-3c4a67d0fe93@172.16.132.117:37586 DEBUG: I1020 08:56:37.399066 28152 master.cpp:2720] Authorizing framework principal 'test-principal' to launch task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b as user 'mockbuild' DEBUG: I1020 08:56:37.399948 28147 master.hpp:176] Adding task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 (smfc-ahy-19-sr2.corpdc.twitter.com) DEBUG: I1020 08:56:37.399991 28147 master.cpp:3254] Launching task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at scheduler-560af120-1924-4956-b664-3c4a67d0fe93@172.16.132.117:37586 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) DEBUG: I1020 08:56:37.400190 28147 slave.cpp:1291] Got assigned task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b for framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.409716 28147 slave.cpp:1407] Launching task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b for framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.410130 28147 paths.cpp:417] Trying to chown '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8' to user 'mockbuild' DEBUG: I1020 08:56:37.418064 28147 slave.cpp:4994] Launching executor 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 with resources cpus(*):0.1; mem(*):32 in work directory '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8' DEBUG: I1020 08:56:37.418360 28137 containerizer.cpp:639] Starting container '2b1371cc-4d78-44d1-b1f1-e1b29127adc8' for executor '0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework '0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000' DEBUG: I1020 08:56:37.418512 28147 slave.cpp:1625] Queuing task '0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' for executor 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework '0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.421762 28143 launcher.cpp:132] Forked child with pid '40438' for container '2b1371cc-4d78-44d1-b1f1-e1b29127adc8' DEBUG: I1020 08:56:37.421979 28143 containerizer.cpp:872] Checkpointing executor's forked pid 40438 to '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8/pids/forked.pid' DEBUG: I1020 08:56:37.463163 28136 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 65.83211ms DEBUG: I1020 08:56:37.463227 28136 replica.cpp:681] Persisted action at 4 DEBUG: I1020 08:56:37.464862 28136 replica.cpp:660] Replica received learned notice for position 4 from @0.0.0.0:0 DEBUG: I1020 08:56:37.504696 40438 exec.cpp:136] Version: 0.26.0-rc0 DEBUG: I1020 08:56:37.506574 28150 slave.cpp:2415] Got registration for executor '0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from executor(1)@172.16.132.117:41415 DEBUG: I1020 08:56:37.575783 40482 exec.cpp:210] Executor registered on slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 DEBUG: I1020 08:56:37.576037 28147 slave.cpp:1796] Sending queued task '0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' to executor '0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: Registered executor on smfc-ahy-19-sr2.corpdc.twitter.com DEBUG: Starting task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b DEBUG: sh -c 'exit 0' DEBUG: Forked command at 40500 DEBUG: I1020 08:56:37.580664 28139 slave.cpp:2778] Handling status update TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from executor(1)@172.16.132.117:41415 DEBUG: I1020 08:56:37.580994 28146 status_update_manager.cpp:322] Received status update TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.581665 28146 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.623446 28136 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 158.550858ms DEBUG: Command exited with status 0 (pid: 40500) DEBUG: I1020 08:56:37.623538 28136 leveldb.cpp:401] Deleting ~2 keys from leveldb took 41621ns DEBUG: I1020 08:56:37.623558 28136 replica.cpp:681] Persisted action at 4 DEBUG: I1020 08:56:37.623571 28136 replica.cpp:666] Replica learned TRUNCATE action at position 4 DEBUG: I1020 08:56:37.681676 28145 slave.cpp:2778] Handling status update TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from executor(1)@172.16.132.117:41415 DEBUG: I1020 08:56:37.840840 28146 status_update_manager.cpp:322] Received status update TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.840920 28146 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.840901 28148 slave.cpp:3090] Forwarding the update TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to master@172.16.132.117:37586 DEBUG: I1020 08:56:37.841084 28148 slave.cpp:3020] Sending acknowledgement for status update TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to executor(1)@172.16.132.117:41415 DEBUG: I1020 08:56:37.841506 28153 master.cpp:4421] Status update TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) DEBUG: I1020 08:56:37.841531 28153 master.cpp:4460] Forwarding status update TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.841815 28153 master.cpp:6086] Updating the latest state of task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to TASK_FINISHED (status update state: TASK_RUNNING) DEBUG: I1020 08:56:37.842349 28131 hierarchical.cpp:739] Recovered cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) on slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 from framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.843065 28152 master.cpp:3612] Processing ACKNOWLEDGE call fb1a8b73-5136-4a8f-ae92-9d26b13fb41b for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at scheduler-560af120-1924-4956-b664-3c4a67d0fe93@172.16.132.117:37586 on slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 DEBUG: I1020 08:56:37.932744 28146 status_update_manager.cpp:394] Received status update acknowledgement (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.932793 28146 status_update_manager.cpp:826] Checkpointing ACK for status update TASK_RUNNING (UUID: fb1a8b73-5136-4a8f-ae92-9d26b13fb41b) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:37.932839 28152 slave.cpp:3020] Sending acknowledgement for status update TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to executor(1)@172.16.132.117:41415 DEBUG: 2015-10-20 08:56:37,948:28115(0x7f526963c700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:34917] zk retcode=-4, errno=111(Connection refused): server refused to accept the client DEBUG: I1020 08:56:38.091502 28144 slave.cpp:3090] Forwarding the update TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to master@172.16.132.117:37586 DEBUG: I1020 08:56:38.091954 28152 master.cpp:4421] Status update TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) DEBUG: I1020 08:56:38.092000 28152 master.cpp:4460] Forwarding status update TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: E1020 08:56:38.092056 28152 master.cpp:6032] Ignoring status update for the terminated task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b (TASK_FINISHED -> TASK_FINISHED) of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.092725 28153 master.cpp:3612] Processing ACKNOWLEDGE call 4a38e137-c590-48f6-ba10-2e0ba4903ac5 for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at scheduler-560af120-1924-4956-b664-3c4a67d0fe93@172.16.132.117:37586 on slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 DEBUG: I1020 08:56:38.092818 28153 status_update_manager.cpp:394] Received status update acknowledgement (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.092852 28153 status_update_manager.cpp:826] Checkpointing ACK for status update TASK_FINISHED (UUID: 4a38e137-c590-48f6-ba10-2e0ba4903ac5) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.691695 28137 containerizer.cpp:1278] Executor for container '2b1371cc-4d78-44d1-b1f1-e1b29127adc8' has exited DEBUG: I1020 08:56:38.691723 28137 containerizer.cpp:1095] Destroying container '2b1371cc-4d78-44d1-b1f1-e1b29127adc8' DEBUG: I1020 08:56:38.708005 28138 master.cpp:4973] Sending 1 offers to framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at scheduler-560af120-1924-4956-b664-3c4a67d0fe93@172.16.132.117:37586 DEBUG: I1020 08:56:38.715711 28141 slave.cpp:3542] Executor '0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 exited with status 0 DEBUG: I1020 08:56:38.715741 28141 slave.cpp:3646] Cleaning up executor '0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: tests/slave_recovery_tests.cpp:881: Failure DEBUG: Mock function called more times than expected - returning directly. DEBUG: Function call: statusUpdate(0x7fffabf3b0c0, @0x7f52f0b9d740 112-byte object <F0-C1 40-FC 52-7F 00-00 00-00 00-00 00-00 00-00 B0-24 01-AC 52-7F 00-00 00-95 01-AC 52-7F 00-00 05-00 00-00 01-00 00-00 80-28 40-01 00-00 00-00 40-AF 01-AC 52-7F 00-00 00-00 00-00 00-00 00-00 09-00 00-00 00-00 00-00 8B-FB 6F-FD 6F-90 D5-41 20-AF 01-AC 52-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 5F-03 00-00>) DEBUG: Expected: to be called twice DEBUG: Actual: called 3 times - over-saturated and active DEBUG: I1020 08:56:38.716214 28141 slave.cpp:3735] Cleaning up framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.716245 28131 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8' for gc 6.99999171211852days in the future DEBUG: I1020 08:56:38.716320 28151 status_update_manager.cpp:284] Closing status update streams for framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.716517 28131 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' for gc 6.99999171165333days in the future DEBUG: I1020 08:56:38.716562 28131 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8' for gc 6.99999171135407days in the future DEBUG: I1020 08:56:38.716585 28131 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' for gc 6.99999171074074days in the future DEBUG: I1020 08:56:38.716604 28131 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000' for gc 6.99999171001185days in the future DEBUG: I1020 08:56:38.716624 28131 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000' for gc 6.9999917082637days in the future DEBUG: I1020 08:56:38.716841 28115 slave.cpp:606] Slave terminating DEBUG: I1020 08:56:38.716974 28150 master.cpp:1086] Slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) disconnected DEBUG: I1020 08:56:38.716994 28150 master.cpp:2540] Disconnecting slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) DEBUG: I1020 08:56:38.717082 28150 master.cpp:2559] Deactivating slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) DEBUG: I1020 08:56:38.717128 28136 hierarchical.cpp:420] Slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 deactivated DEBUG: I1020 08:56:38.717308 28136 hierarchical.cpp:739] Recovered cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) on slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 from framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.718613 28115 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix DEBUG: W1020 08:56:38.718729 28115 backend.cpp:50] Failed to create 'bind' backend: BindBackend requires root privileges DEBUG: I1020 08:56:38.720690 28142 slave.cpp:191] Slave started on 256)@172.16.132.117:37586 DEBUG: I1020 08:56:38.720705 28142 slave.cpp:192] Flags at startup: --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_local_archives_dir="/tmp/mesos/images/docker" --docker_puller="local" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --egress_unique_flow_per_container="false" --enforce_container_disk_quota="false" --ephemeral_ports_per_container="1024" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/builddir/build/BUILD/mesos-0.26.0/src" --logbufsecs="0" --logging_level="INFO" --network_enable_socket_statistics_details="false" --network_enable_socket_statistics_summary="false" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx" DEBUG: I1020 08:56:38.720965 28142 credentials.hpp:85] Loading credential for authentication from '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/credential' DEBUG: I1020 08:56:38.721045 28142 slave.cpp:322] Slave using credential for: test-principal DEBUG: I1020 08:56:38.721573 28142 slave.cpp:392] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] DEBUG: I1020 08:56:38.721603 28142 slave.cpp:400] Slave attributes: [ ] DEBUG: I1020 08:56:38.721609 28142 slave.cpp:405] Slave hostname: smfc-ahy-19-sr2.corpdc.twitter.com DEBUG: I1020 08:56:38.721613 28142 slave.cpp:410] Slave checkpoint: true DEBUG: I1020 08:56:38.721895 28131 state.cpp:54] Recovering state from '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta' DEBUG: I1020 08:56:38.721935 28131 state.cpp:681] No checkpointed resources found at '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/resources/resources.info' DEBUG: I1020 08:56:38.723804 28131 slave.cpp:4310] Recovering framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.723827 28131 slave.cpp:5103] Recovering executor '0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.724068 28142 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8' for gc 6.99999162064889days in the future DEBUG: I1020 08:56:38.724123 28142 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b/runs/2b1371cc-4d78-44d1-b1f1-e1b29127adc8' for gc 6.99999161984889days in the future DEBUG: I1020 08:56:38.724180 28142 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' for gc 6.99999161965333days in the future DEBUG: I1020 08:56:38.724195 28131 slave.cpp:3735] Cleaning up framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.724220 28142 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000/executors/0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' for gc 6.99999161943111days in the future DEBUG: I1020 08:56:38.724318 28142 status_update_manager.cpp:284] Closing status update streams for framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.724431 28142 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000' for gc 6.99999161661037days in the future DEBUG: I1020 08:56:38.724465 28142 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RecoverCompletedExecutor_WO4xyx/meta/slaves/0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0/frameworks/0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000' for gc 6.99999161520889days in the future DEBUG: I1020 08:56:38.724778 28140 status_update_manager.cpp:202] Recovering status update manager DEBUG: I1020 08:56:38.724791 28140 status_update_manager.cpp:210] Recovering executor '0bbb69a3-3ba3-4016-86ed-4d80c1bf594b' of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.725278 28132 containerizer.cpp:385] Recovering containerizer DEBUG: I1020 08:56:38.725953 28142 slave.cpp:4222] Finished recovery DEBUG: I1020 08:56:38.726372 28131 status_update_manager.cpp:176] Pausing sending status updates DEBUG: I1020 08:56:38.726676 28133 slave.cpp:726] New master detected at master@172.16.132.117:37586 DEBUG: I1020 08:56:38.726698 28133 slave.cpp:789] Authenticating with master master@172.16.132.117:37586 DEBUG: I1020 08:56:38.726706 28133 slave.cpp:794] Using default CRAM-MD5 authenticatee DEBUG: I1020 08:56:38.726750 28133 slave.cpp:762] Detecting new master DEBUG: I1020 08:56:38.726832 28153 authenticatee.cpp:115] Creating new client SASL connection DEBUG: I1020 08:56:38.727023 28153 master.cpp:5144] Authenticating slave(256)@172.16.132.117:37586 DEBUG: I1020 08:56:38.727404 28149 authenticator.cpp:92] Creating new server SASL connection DEBUG: I1020 08:56:38.727483 28149 authenticatee.cpp:206] Received SASL authentication mechanisms: CRAM-MD5 DEBUG: I1020 08:56:38.727496 28149 authenticatee.cpp:232] Attempting to authenticate with mechanism 'CRAM-MD5' DEBUG: I1020 08:56:38.727525 28149 authenticator.cpp:197] Received SASL authentication start DEBUG: I1020 08:56:38.727591 28149 authenticator.cpp:319] Authentication requires more steps DEBUG: I1020 08:56:38.727670 28135 authenticatee.cpp:252] Received SASL authentication step DEBUG: I1020 08:56:38.727736 28135 authenticator.cpp:225] Received SASL authentication step DEBUG: I1020 08:56:38.727766 28135 authenticator.cpp:311] Authentication success DEBUG: I1020 08:56:38.727821 28147 authenticatee.cpp:292] Authentication success DEBUG: I1020 08:56:38.727826 28132 master.cpp:5174] Successfully authenticated principal 'test-principal' at slave(256)@172.16.132.117:37586 DEBUG: I1020 08:56:38.728272 28144 slave.cpp:857] Successfully authenticated with master master@172.16.132.117:37586 DEBUG: I1020 08:56:38.728466 28153 master.cpp:4028] Re-registering slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(255)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) DEBUG: W1020 08:56:38.728508 28153 master.cpp:5241] Task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 unknown to the slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(256)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) during re-registration: reconciling with the slave DEBUG: I1020 08:56:38.728565 28153 master.cpp:4216] Sending updated checkpointed resources to slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(256)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) DEBUG: I1020 08:56:38.728612 28145 slave.cpp:1001] Re-registered with master master@172.16.132.117:37586 DEBUG: I1020 08:56:38.728628 28140 hierarchical.cpp:408] Slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 reactivated DEBUG: I1020 08:56:38.728693 28145 slave.cpp:1037] Forwarding total oversubscribed resources DEBUG: I1020 08:56:38.728703 28132 status_update_manager.cpp:183] Resuming sending status updates DEBUG: W1020 08:56:38.728718 28145 slave.cpp:1077] Slave reconciling task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 in state TASK_LOST: task unknown to the slave DEBUG: I1020 08:56:38.728761 28135 master.cpp:4278] Received update of slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(256)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) with total oversubscribed resources DEBUG: I1020 08:56:38.728778 28141 status_update_manager.cpp:322] Received status update TASK_LOST (UUID: 30c316cf-fda2-4ab4-9c43-910e9416bab2) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.728927 28135 hierarchical.cpp:391] Slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 (smfc-ahy-19-sr2.corpdc.twitter.com) updated with oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: ) DEBUG: I1020 08:56:38.728973 28145 slave.cpp:2284] Updated checkpointed resources from to DEBUG: I1020 08:56:38.729003 28145 slave.cpp:3090] Forwarding the update TASK_LOST (UUID: 30c316cf-fda2-4ab4-9c43-910e9416bab2) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 to master@172.16.132.117:37586 DEBUG: I1020 08:56:38.729135 28148 master.cpp:4973] Sending 1 offers to framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 (default) at scheduler-560af120-1924-4956-b664-3c4a67d0fe93@172.16.132.117:37586 DEBUG: I1020 08:56:38.729395 28148 master.cpp:4421] Status update TASK_LOST (UUID: 30c316cf-fda2-4ab4-9c43-910e9416bab2) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 from slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(256)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) DEBUG: I1020 08:56:38.729423 28148 master.cpp:4460] Forwarding status update TASK_LOST (UUID: 30c316cf-fda2-4ab4-9c43-910e9416bab2) for task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: E1020 08:56:38.729467 28148 master.cpp:6032] Ignoring status update for the terminated task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b (TASK_FINISHED -> TASK_LOST) of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.729768 28115 sched.cpp:1781] Asked to stop the driver DEBUG: I1020 08:56:38.729800 28115 master.cpp:925] Master terminating DEBUG: I1020 08:56:38.729825 28115 master.cpp:6154] Removing task 0bbb69a3-3ba3-4016-86ed-4d80c1bf594b with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] of framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 on slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 at slave(256)@172.16.132.117:37586 (smfc-ahy-19-sr2.corpdc.twitter.com) DEBUG: I1020 08:56:38.729920 28137 hierarchical.cpp:364] Removed slave 0d54e2f1-43d7-4f74-8532-9c37ac40791b-S0 DEBUG: I1020 08:56:38.729949 28135 sched.cpp:1042] Stopping framework '0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000' DEBUG: I1020 08:56:38.730121 28137 hierarchical.cpp:220] Removed framework 0d54e2f1-43d7-4f74-8532-9c37ac40791b-0000 DEBUG: I1020 08:56:38.730507 28135 slave.cpp:3258] master@172.16.132.117:37586 exited DEBUG: W1020 08:56:38.730526 28135 slave.cpp:3261] Master disconnected! Waiting for a new master to be elected DEBUG: [ FAILED ] SlaveRecoveryTest/0.RecoverCompletedExecutor, where TypeParam = mesos::internal::slave::MesosContainerizer (2288 ms)
Attachments
Issue Links
- duplicates
-
MESOS-3813 SlaveRecoveryTest/0.RecoverCompletedExecutor is flaky
- Resolved
- relates to
-
MESOS-2864 Master should not change the state of a terminal task if it receives another terminal update
- Resolved