Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-2228

SlaveTest.MesosExecutorGracefulShutdown is flaky

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 0.22.0
    • 0.22.0
    • test
    • Twitter Mesos Q1 Sprint 1, Twitter Mesos Q1 Sprint 2
    • 3

    Description

      Observed this on internal CI

      [ RUN      ] SlaveTest.MesosExecutorGracefulShutdown
      Using temporary directory '/tmp/SlaveTest_MesosExecutorGracefulShutdown_AWdtVJ'
      I0124 08:14:04.399211  7926 leveldb.cpp:176] Opened db in 27.364056ms
      I0124 08:14:04.402632  7926 leveldb.cpp:183] Compacted db in 3.357646ms
      I0124 08:14:04.402691  7926 leveldb.cpp:198] Created db iterator in 23822ns
      I0124 08:14:04.402708  7926 leveldb.cpp:204] Seeked to beginning of db in 1913ns
      I0124 08:14:04.402716  7926 leveldb.cpp:273] Iterated through 0 keys in the db in 458ns
      I0124 08:14:04.402767  7926 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
      I0124 08:14:04.403728  7951 recover.cpp:449] Starting replica recovery
      I0124 08:14:04.404011  7951 recover.cpp:475] Replica is in EMPTY status
      I0124 08:14:04.407765  7950 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request
      I0124 08:14:04.408710  7951 recover.cpp:195] Received a recover response from a replica in EMPTY status
      I0124 08:14:04.419666  7951 recover.cpp:566] Updating replica status to STARTING
      I0124 08:14:04.429719  7953 master.cpp:262] Master 20150124-081404-16842879-47787-7926 (utopic) started on 127.0.1.1:47787
      I0124 08:14:04.429790  7953 master.cpp:308] Master only allowing authenticated frameworks to register
      I0124 08:14:04.429802  7953 master.cpp:313] Master only allowing authenticated slaves to register
      I0124 08:14:04.429826  7953 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveTest_MesosExecutorGracefulShutdown_AWdtVJ/credentials'
      I0124 08:14:04.430277  7953 master.cpp:357] Authorization enabled
      I0124 08:14:04.432682  7953 master.cpp:1219] The newly elected leader is master@127.0.1.1:47787 with id 20150124-081404-16842879-47787-7926
      I0124 08:14:04.432816  7953 master.cpp:1232] Elected as the leading master!
      I0124 08:14:04.432894  7953 master.cpp:1050] Recovering from registrar
      I0124 08:14:04.433212  7950 registrar.cpp:313] Recovering registrar
      I0124 08:14:04.434226  7951 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 14.323302ms
      I0124 08:14:04.434270  7951 replica.cpp:323] Persisted replica status to STARTING
      I0124 08:14:04.434489  7951 recover.cpp:475] Replica is in STARTING status
      I0124 08:14:04.436164  7951 replica.cpp:641] Replica in STARTING status received a broadcasted recover request
      I0124 08:14:04.439368  7947 recover.cpp:195] Received a recover response from a replica in STARTING status
      I0124 08:14:04.440626  7947 recover.cpp:566] Updating replica status to VOTING
      I0124 08:14:04.443667  7947 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.698664ms
      I0124 08:14:04.443759  7947 replica.cpp:323] Persisted replica status to VOTING
      I0124 08:14:04.443925  7947 recover.cpp:580] Successfully joined the Paxos group
      I0124 08:14:04.444160  7947 recover.cpp:464] Recover process terminated
      I0124 08:14:04.444543  7949 log.cpp:660] Attempting to start the writer
      I0124 08:14:04.446331  7949 replica.cpp:477] Replica received implicit promise request with proposal 1
      I0124 08:14:04.449329  7949 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 2.690453ms
      I0124 08:14:04.449388  7949 replica.cpp:345] Persisted promised to 1
      I0124 08:14:04.450637  7947 coordinator.cpp:230] Coordinator attemping to fill missing position
      I0124 08:14:04.452271  7949 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2
      I0124 08:14:04.455124  7949 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 2.593522ms
      I0124 08:14:04.455157  7949 replica.cpp:679] Persisted action at 0
      I0124 08:14:04.456594  7951 replica.cpp:511] Replica received write request for position 0
      I0124 08:14:04.456657  7951 leveldb.cpp:438] Reading position from leveldb took 30358ns
      I0124 08:14:04.464860  7951 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 8.164646ms
      I0124 08:14:04.464903  7951 replica.cpp:679] Persisted action at 0
      I0124 08:14:04.465947  7949 replica.cpp:658] Replica received learned notice for position 0
      I0124 08:14:04.471567  7949 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 5.587838ms
      I0124 08:14:04.471601  7949 replica.cpp:679] Persisted action at 0
      I0124 08:14:04.471622  7949 replica.cpp:664] Replica learned NOP action at position 0
      I0124 08:14:04.472682  7951 log.cpp:676] Writer started with ending position 0
      I0124 08:14:04.473919  7951 leveldb.cpp:438] Reading position from leveldb took 28676ns
      I0124 08:14:04.491591  7951 registrar.cpp:346] Successfully fetched the registry (0B) in 58.337024ms
      I0124 08:14:04.491704  7951 registrar.cpp:445] Applied 1 operations in 28163ns; attempting to update the 'registry'
      I0124 08:14:04.493938  7953 log.cpp:684] Attempting to append 118 bytes to the log
      I0124 08:14:04.494122  7953 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1
      I0124 08:14:04.495069  7953 replica.cpp:511] Replica received write request for position 1
      I0124 08:14:04.500089  7953 leveldb.cpp:343] Persisting action (135 bytes) to leveldb took 4.989356ms
      I0124 08:14:04.500123  7953 replica.cpp:679] Persisted action at 1
      I0124 08:14:04.501271  7950 replica.cpp:658] Replica received learned notice for position 1
      I0124 08:14:04.505698  7950 leveldb.cpp:343] Persisting action (137 bytes) to leveldb took 4.396221ms
      I0124 08:14:04.505734  7950 replica.cpp:679] Persisted action at 1
      I0124 08:14:04.505755  7950 replica.cpp:664] Replica learned APPEND action at position 1
      I0124 08:14:04.507313  7950 registrar.cpp:490] Successfully updated the 'registry' in 15.52896ms
      I0124 08:14:04.507478  7953 log.cpp:703] Attempting to truncate the log to 1
      I0124 08:14:04.507848  7953 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2
      I0124 08:14:04.508743  7953 replica.cpp:511] Replica received write request for position 2
      I0124 08:14:04.509214  7950 registrar.cpp:376] Successfully recovered registrar
      I0124 08:14:04.509682  7946 master.cpp:1077] Recovered 0 slaves from the Registry (82B) ; allowing 10mins for slaves to re-register
      I0124 08:14:04.514654  7953 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 5.880031ms
      I0124 08:14:04.514689  7953 replica.cpp:679] Persisted action at 2
      I0124 08:14:04.515736  7953 replica.cpp:658] Replica received learned notice for position 2
      I0124 08:14:04.522014  7953 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 6.245138ms
      I0124 08:14:04.522086  7953 leveldb.cpp:401] Deleting ~1 keys from leveldb took 37803ns
      I0124 08:14:04.522107  7953 replica.cpp:679] Persisted action at 2
      I0124 08:14:04.522128  7953 replica.cpp:664] Replica learned TRUNCATE action at position 2
      I0124 08:14:04.531460  7926 containerizer.cpp:103] Using isolation: posix/cpu,posix/mem
      I0124 08:14:04.547194  7951 slave.cpp:173] Slave started on 208)@127.0.1.1:47787
      I0124 08:14:04.555682  7951 credentials.hpp:84] Loading credential for authentication from '/tmp/SlaveTest_MesosExecutorGracefulShutdown_kB74xo/credential'
      I0124 08:14:04.556622  7951 slave.cpp:282] Slave using credential for: test-principal
      I0124 08:14:04.557052  7951 slave.cpp:300] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
      I0124 08:14:04.557842  7951 slave.cpp:329] Slave hostname: utopic
      I0124 08:14:04.558091  7951 slave.cpp:330] Slave checkpoint: false
      W0124 08:14:04.558352  7951 slave.cpp:332] Disabling checkpointing is deprecated and the --checkpoint flag will be removed in a future release. Please avoid using this flag
      I0124 08:14:04.566864  7948 state.cpp:33] Recovering state from '/tmp/SlaveTest_MesosExecutorGracefulShutdown_kB74xo/meta'
      I0124 08:14:04.575711  7951 status_update_manager.cpp:197] Recovering status update manager
      I0124 08:14:04.575904  7951 containerizer.cpp:300] Recovering containerizer
      I0124 08:14:04.577112  7951 slave.cpp:3519] Finished recovery
      I0124 08:14:04.577374  7926 sched.cpp:151] Version: 0.22.0
      I0124 08:14:04.578663  7950 sched.cpp:248] New master detected at master@127.0.1.1:47787
      I0124 08:14:04.578759  7950 sched.cpp:304] Authenticating with master master@127.0.1.1:47787
      I0124 08:14:04.578781  7950 sched.cpp:311] Using default CRAM-MD5 authenticatee
      I0124 08:14:04.579071  7950 authenticatee.hpp:138] Creating new client SASL connection
      I0124 08:14:04.579550  7947 master.cpp:4129] Authenticating scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787
      I0124 08:14:04.579582  7947 master.cpp:4140] Using default CRAM-MD5 authenticator
      I0124 08:14:04.580031  7947 authenticator.hpp:170] Creating new server SASL connection
      I0124 08:14:04.580402  7947 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5
      I0124 08:14:04.580430  7947 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0124 08:14:04.580538  7947 authenticator.hpp:276] Received SASL authentication start
      I0124 08:14:04.580581  7947 authenticator.hpp:398] Authentication requires more steps
      I0124 08:14:04.580651  7947 authenticatee.hpp:275] Received SASL authentication step
      I0124 08:14:04.580746  7947 authenticator.hpp:304] Received SASL authentication step
      I0124 08:14:04.580837  7947 authenticator.hpp:390] Authentication success
      I0124 08:14:04.580940  7947 authenticatee.hpp:315] Authentication success
      I0124 08:14:04.581009  7947 master.cpp:4187] Successfully authenticated principal 'test-principal' at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787
      I0124 08:14:04.581328  7947 sched.cpp:392] Successfully authenticated with master master@127.0.1.1:47787
      I0124 08:14:04.581509  7947 master.cpp:1420] Received registration request for framework 'default' at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787
      I0124 08:14:04.581585  7947 master.cpp:1298] Authorizing framework principal 'test-principal' to receive offers for role '*'
      I0124 08:14:04.582033  7947 master.cpp:1484] Registering framework 20150124-081404-16842879-47787-7926-0000 (default) at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787
      I0124 08:14:04.582595  7947 hierarchical_allocator_process.hpp:319] Added framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:04.583051  7947 sched.cpp:442] Framework registered with 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:04.584087  7951 slave.cpp:613] New master detected at master@127.0.1.1:47787
      I0124 08:14:04.584388  7951 slave.cpp:676] Authenticating with master master@127.0.1.1:47787
      I0124 08:14:04.584564  7951 slave.cpp:681] Using default CRAM-MD5 authenticatee
      I0124 08:14:04.584951  7951 slave.cpp:649] Detecting new master
      I0124 08:14:04.585219  7951 status_update_manager.cpp:171] Pausing sending status updates
      I0124 08:14:04.585604  7951 authenticatee.hpp:138] Creating new client SASL connection
      I0124 08:14:04.587666  7953 master.cpp:4129] Authenticating slave(208)@127.0.1.1:47787
      I0124 08:14:04.587702  7953 master.cpp:4140] Using default CRAM-MD5 authenticator
      I0124 08:14:04.588434  7953 authenticator.hpp:170] Creating new server SASL connection
      I0124 08:14:04.588764  7953 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5
      I0124 08:14:04.588790  7953 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0124 08:14:04.588896  7953 authenticator.hpp:276] Received SASL authentication start
      I0124 08:14:04.588935  7953 authenticator.hpp:398] Authentication requires more steps
      I0124 08:14:04.589005  7953 authenticatee.hpp:275] Received SASL authentication step
      I0124 08:14:04.589082  7953 authenticator.hpp:304] Received SASL authentication step
      I0124 08:14:04.589140  7953 authenticator.hpp:390] Authentication success
      I0124 08:14:04.589232  7953 authenticatee.hpp:315] Authentication success
      I0124 08:14:04.589300  7953 master.cpp:4187] Successfully authenticated principal 'test-principal' at slave(208)@127.0.1.1:47787
      I0124 08:14:04.589587  7953 slave.cpp:747] Successfully authenticated with master master@127.0.1.1:47787
      I0124 08:14:04.589913  7953 master.cpp:3275] Registering slave at slave(208)@127.0.1.1:47787 (utopic) with id 20150124-081404-16842879-47787-7926-S0
      I0124 08:14:04.590322  7953 registrar.cpp:445] Applied 1 operations in 60404ns; attempting to update the 'registry'
      I0124 08:14:04.595336  7948 log.cpp:684] Attempting to append 283 bytes to the log
      I0124 08:14:04.595552  7948 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3
      I0124 08:14:04.596535  7948 replica.cpp:511] Replica received write request for position 3
      I0124 08:14:04.597846  7951 master.cpp:3263] Ignoring register slave message from slave(208)@127.0.1.1:47787 (utopic) as admission is already in progress
      I0124 08:14:04.602326  7948 leveldb.cpp:343] Persisting action (302 bytes) to leveldb took 5.758211ms
      I0124 08:14:04.602363  7948 replica.cpp:679] Persisted action at 3
      I0124 08:14:04.603492  7951 replica.cpp:658] Replica received learned notice for position 3
      I0124 08:14:04.608952  7951 leveldb.cpp:343] Persisting action (304 bytes) to leveldb took 5.427195ms
      I0124 08:14:04.608985  7951 replica.cpp:679] Persisted action at 3
      I0124 08:14:04.609007  7951 replica.cpp:664] Replica learned APPEND action at position 3
      I0124 08:14:04.610643  7951 registrar.cpp:490] Successfully updated the 'registry' in 20.258048ms
      I0124 08:14:04.610800  7948 log.cpp:703] Attempting to truncate the log to 3
      I0124 08:14:04.611184  7948 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4
      I0124 08:14:04.612076  7948 replica.cpp:511] Replica received write request for position 4
      I0124 08:14:04.613061  7946 master.cpp:3329] Registered slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
      I0124 08:14:04.613299  7946 hierarchical_allocator_process.hpp:453] Added slave 20150124-081404-16842879-47787-7926-S0 (utopic) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
      I0124 08:14:04.613688  7946 slave.cpp:781] Registered with master master@127.0.1.1:47787; given slave ID 20150124-081404-16842879-47787-7926-S0
      I0124 08:14:04.614112  7946 master.cpp:4071] Sending 1 offers to framework 20150124-081404-16842879-47787-7926-0000 (default) at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787
      I0124 08:14:04.614228  7946 status_update_manager.cpp:178] Resuming sending status updates
      I0124 08:14:04.617481  7947 master.cpp:2677] Processing ACCEPT call for offers: [ 20150124-081404-16842879-47787-7926-O0 ] on slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic) for framework 20150124-081404-16842879-47787-7926-0000 (default) at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787
      I0124 08:14:04.617535  7947 master.cpp:2513] Authorizing framework principal 'test-principal' to launch task 7c16772d-4aed-4719-81c4-658a2cc22543 as user 'jenkins'
      I0124 08:14:04.618736  7947 master.hpp:782] Adding task 7c16772d-4aed-4719-81c4-658a2cc22543 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150124-081404-16842879-47787-7926-S0 (utopic)
      I0124 08:14:04.618854  7947 master.cpp:2885] Launching task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 (default) at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic)
      I0124 08:14:04.619209  7947 slave.cpp:1130] Got assigned task 7c16772d-4aed-4719-81c4-658a2cc22543 for framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:04.619472  7948 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 7.364828ms
      I0124 08:14:04.619941  7948 replica.cpp:679] Persisted action at 4
      I0124 08:14:04.624851  7953 replica.cpp:658] Replica received learned notice for position 4
      I0124 08:14:04.625757  7947 slave.cpp:1245] Launching task 7c16772d-4aed-4719-81c4-658a2cc22543 for framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:04.630590  7953 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 5.705336ms
      I0124 08:14:04.630805  7953 leveldb.cpp:401] Deleting ~2 keys from leveldb took 51263ns
      I0124 08:14:04.630828  7953 replica.cpp:679] Persisted action at 4
      I0124 08:14:04.630851  7953 replica.cpp:664] Replica learned TRUNCATE action at position 4
      I0124 08:14:04.633968  7947 slave.cpp:3921] Launching executor 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 in work directory '/tmp/SlaveTest_MesosExecutorGracefulShutdown_kB74xo/slaves/20150124-081404-16842879-47787-7926-S0/frameworks/20150124-081404-16842879-47787-7926-0000/executors/7c16772d-4aed-4719-81c4-658a2cc22543/runs/53887a08-f11d-4a2f-a659-a715d9fcf3d2'
      I0124 08:14:04.634963  7951 containerizer.cpp:445] Starting container '53887a08-f11d-4a2f-a659-a715d9fcf3d2' for executor '7c16772d-4aed-4719-81c4-658a2cc22543' of framework '20150124-081404-16842879-47787-7926-0000'
      W0124 08:14:04.636931  7951 containerizer.cpp:296] CommandInfo.grace_period flag is not set, using default value: 3secs
      I0124 08:14:04.655591  7947 slave.cpp:1368] Queuing task '7c16772d-4aed-4719-81c4-658a2cc22543' for executor 7c16772d-4aed-4719-81c4-658a2cc22543 of framework '20150124-081404-16842879-47787-7926-0000
      I0124 08:14:04.656992  7951 launcher.cpp:137] Forked child with pid '11030' for container '53887a08-f11d-4a2f-a659-a715d9fcf3d2'
      I0124 08:14:04.673646  7951 slave.cpp:2890] Monitoring executor '7c16772d-4aed-4719-81c4-658a2cc22543' of framework '20150124-081404-16842879-47787-7926-0000' in container '53887a08-f11d-4a2f-a659-a715d9fcf3d2'
      I0124 08:14:04.964946 11044 exec.cpp:147] Version: 0.22.0
      I0124 08:14:05.113059  7948 slave.cpp:1912] Got registration for executor '7c16772d-4aed-4719-81c4-658a2cc22543' of framework 20150124-081404-16842879-47787-7926-0000 from executor(1)@127.0.1.1:49174
      I0124 08:14:05.121086  7948 slave.cpp:2031] Flushing queued task 7c16772d-4aed-4719-81c4-658a2cc22543 for executor '7c16772d-4aed-4719-81c4-658a2cc22543' of framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:05.266849 11062 exec.cpp:221] Executor registered on slave 20150124-081404-16842879-47787-7926-S0
      Shutdown timeout is set to 3secsRegistered executor on utopic
      Starting task 7c16772d-4aed-4719-81c4-658a2cc22543
      Forked command at 11067
      sh -c 'sleep 1000'
      I0124 08:14:05.492084  7953 slave.cpp:2265] Handling status update TASK_RUNNING (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 from executor(1)@127.0.1.1:49174
      I0124 08:14:05.492805  7953 status_update_manager.cpp:317] Received status update TASK_RUNNING (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:05.493762  7953 slave.cpp:2508] Forwarding the update TASK_RUNNING (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 to master@127.0.1.1:47787
      I0124 08:14:05.493948  7953 slave.cpp:2441] Sending acknowledgement for status update TASK_RUNNING (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 to executor(1)@127.0.1.1:49174
      I0124 08:14:05.495378  7949 master.cpp:3652] Forwarding status update TASK_RUNNING (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:05.495584  7949 master.cpp:3624] Status update TASK_RUNNING (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 from slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic)
      I0124 08:14:05.495678  7949 master.cpp:4934] Updating the latest state of task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 to TASK_RUNNING
      I0124 08:14:05.496422  7949 master.cpp:3125] Forwarding status update acknowledgement 54742a87-ef02-4e72-a19b-83b0eeb62568 for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 (default) at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787 to slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic)
      I0124 08:14:05.497735  7946 master.cpp:2961] Asked to kill task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:05.497859  7946 master.cpp:3021] Telling slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic) to kill task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 (default) at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787
      I0124 08:14:05.498589  7947 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 54742a87-ef02-4e72-a19b-83b0eeb62568) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:05.499006  7953 slave.cpp:1424] Asked to kill task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000
      Shutting down
      Sending SIGTERM to process tree at pid 11067
      Killing the following process trees:
      [ 
      -+- 11067 sh -c sleep 1000 
       \--- 11068 sleep 1000 
      ]
      2015-01-24 08:14:07,295:7926(0x7f30b1b34700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:57753] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
      Process 11067 did not terminate after 3secs, sending SIGKILL to process tree at 11067
      Killed the following process trees:
      [ 
      -+- 11067 sh -c sleep 1000 
       \--- 11068 sleep 1000 
      ]
      Command terminated with signal Killed (pid: 11067)
      I0124 08:14:09.063453  7953 slave.cpp:2265] Handling status update TASK_KILLED (UUID: 4bd05372-2705-46e5-8182-5cb6907fbab3) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 from executor(1)@127.0.1.1:49174
      I0124 08:14:09.069545  7953 status_update_manager.cpp:317] Received status update TASK_KILLED (UUID: 4bd05372-2705-46e5-8182-5cb6907fbab3) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:09.070265  7953 slave.cpp:2508] Forwarding the update TASK_KILLED (UUID: 4bd05372-2705-46e5-8182-5cb6907fbab3) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 to master@127.0.1.1:47787
      I0124 08:14:09.070996  7947 master.cpp:3652] Forwarding status update TASK_KILLED (UUID: 4bd05372-2705-46e5-8182-5cb6907fbab3) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:09.071182  7947 master.cpp:3624] Status update TASK_KILLED (UUID: 4bd05372-2705-46e5-8182-5cb6907fbab3) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 from slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic)
      I0124 08:14:09.071260  7947 master.cpp:4934] Updating the latest state of task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 to TASK_KILLED
      I0124 08:14:09.072052  7947 hierarchical_allocator_process.hpp:653] Recovered cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total allocatable: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) on slave 20150124-081404-16842879-47787-7926-S0 from framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:09.072449  7947 master.cpp:4993] Removing task 7c16772d-4aed-4719-81c4-658a2cc22543 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] of framework 20150124-081404-16842879-47787-7926-0000 on slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic)
      I0124 08:14:09.072700  7947 master.cpp:3125] Forwarding status update acknowledgement 4bd05372-2705-46e5-8182-5cb6907fbab3 for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 (default) at scheduler-4a6c5cde-c54a-455a-aaad-6fc4e8ee99ef@127.0.1.1:47787 to slave 20150124-081404-16842879-47787-7926-S0 at slave(208)@127.0.1.1:47787 (utopic)
      ../../src/tests/slave_tests.cpp:1736: Failure
      Expected: (std::string::npos) != (statusKilled.get().message().find("Terminated")), actual: 18446744073709551615 vs 18446744073709551615
      I0124 08:14:09.073422  7926 sched.cpp:1471] Asked to stop the driver
      I0124 08:14:09.073629  7926 master.cpp:654] Master terminating
      I0124 08:14:09.075768  7950 sched.cpp:808] Stopping framework '20150124-081404-16842879-47787-7926-0000'
      I0124 08:14:09.079352  7953 slave.cpp:2441] Sending acknowledgement for status update TASK_KILLED (UUID: 4bd05372-2705-46e5-8182-5cb6907fbab3) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000 to executor(1)@127.0.1.1:49174
      I0124 08:14:09.085199  7953 slave.cpp:2673] master@127.0.1.1:47787 exited
      W0124 08:14:09.085232  7953 slave.cpp:2676] Master disconnected! Waiting for a new master to be elected
      I0124 08:14:09.085263  7953 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 4bd05372-2705-46e5-8182-5cb6907fbab3) for task 7c16772d-4aed-4719-81c4-658a2cc22543 of framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:09.120879  7946 containerizer.cpp:879] Destroying container '53887a08-f11d-4a2f-a659-a715d9fcf3d2'
      I0124 08:14:09.216553  7952 containerizer.cpp:1084] Executor for container '53887a08-f11d-4a2f-a659-a715d9fcf3d2' has exited
      I0124 08:14:09.218641  7952 slave.cpp:2948] Executor '7c16772d-4aed-4719-81c4-658a2cc22543' of framework 20150124-081404-16842879-47787-7926-0000 terminated with signal Killed
      I0124 08:14:09.218855  7952 slave.cpp:3057] Cleaning up executor '7c16772d-4aed-4719-81c4-658a2cc22543' of framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:09.223268  7947 gc.cpp:56] Scheduling '/tmp/SlaveTest_MesosExecutorGracefulShutdown_kB74xo/slaves/20150124-081404-16842879-47787-7926-S0/frameworks/20150124-081404-16842879-47787-7926-0000/executors/7c16772d-4aed-4719-81c4-658a2cc22543/runs/53887a08-f11d-4a2f-a659-a715d9fcf3d2' for gc 6.99999746482667days in the future
      I0124 08:14:09.224205  7947 gc.cpp:56] Scheduling '/tmp/SlaveTest_MesosExecutorGracefulShutdown_kB74xo/slaves/20150124-081404-16842879-47787-7926-S0/frameworks/20150124-081404-16842879-47787-7926-0000/executors/7c16772d-4aed-4719-81c4-658a2cc22543' for gc 6.99999746293926days in the future
      I0124 08:14:09.227552  7952 slave.cpp:3136] Cleaning up framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:09.229786  7949 status_update_manager.cpp:279] Closing status update streams for framework 20150124-081404-16842879-47787-7926-0000
      I0124 08:14:09.230849  7952 slave.cpp:495] Slave terminating
      I0124 08:14:09.230989  7952 gc.cpp:56] Scheduling '/tmp/SlaveTest_MesosExecutorGracefulShutdown_kB74xo/slaves/20150124-081404-16842879-47787-7926-S0/frameworks/20150124-081404-16842879-47787-7926-0000' for gc 6.99999732935407days in the future
      [  FAILED  ] SlaveTest.MesosExecutorGracefulShutdown (4881 ms)
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bmahler Benjamin Mahler
            vinodkone Vinod Kone
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Agile

                Completed Sprints:
                Twitter Mesos Q1 Sprint 1 ended 02/Feb/15
                Twitter Mesos Q1 Sprint 2 ended 17/Feb/15
                View on Board

                Slack

                  Issue deployment