Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1543

MasterTest.OrphanTasks is flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.20.0
    • test
    • None
    • Q2 Sprint 5

    Description

      Looks like the slave's detector was discarded before the slave shutdown.

      [ RUN      ] MasterTest.OrphanTasks
      Using temporary directory '/tmp/MasterTest_OrphanTasks_s8r6NU'
      I0625 16:07:58.861265 26203 leveldb.cpp:176] Opened db in 48.741714ms
      I0625 16:07:58.877020 26203 leveldb.cpp:183] Compacted db in 15.444957ms
      I0625 16:07:58.877488 26203 leveldb.cpp:198] Created db iterator in 8308ns
      I0625 16:07:58.877691 26203 leveldb.cpp:204] Seeked to beginning of db in 2133ns
      I0625 16:07:58.878000 26203 leveldb.cpp:273] Iterated through 0 keys in the db in 1896ns
      I0625 16:07:58.878221 26203 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
      I0625 16:07:58.878715 26229 recover.cpp:425] Starting replica recovery
      I0625 16:07:58.878783 26229 recover.cpp:451] Replica is in EMPTY status
      I0625 16:07:58.879034 26229 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request
      I0625 16:07:58.879088 26229 recover.cpp:188] Received a recover response from a replica in EMPTY status
      I0625 16:07:58.879166 26229 recover.cpp:542] Updating replica status to STARTING
      I0625 16:07:58.882308 26228 master.cpp:288] Master 20140625-160758-16842879-47563-26203 (lucid) started on 127.0.1.1:47563
      I0625 16:07:58.882339 26228 master.cpp:325] Master only allowing authenticated frameworks to register
      I0625 16:07:58.882345 26228 master.cpp:330] Master only allowing authenticated slaves to register
      I0625 16:07:58.882351 26228 credentials.hpp:35] Loading credentials for authentication from '/tmp/MasterTest_OrphanTasks_s8r6NU/credentials'
      I0625 16:07:58.882427 26228 master.cpp:356] Authorization enabled
      I0625 16:07:58.883036 26228 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:47563
      I0625 16:07:58.883080 26228 master.cpp:122] No whitelist given. Advertising offers for all slaves
      I0625 16:07:58.883249 26228 master.cpp:1122] The newly elected leader is master@127.0.1.1:47563 with id 20140625-160758-16842879-47563-26203
      I0625 16:07:58.883262 26228 master.cpp:1135] Elected as the leading master!
      I0625 16:07:58.883268 26228 master.cpp:953] Recovering from registrar
      I0625 16:07:58.883323 26228 registrar.cpp:313] Recovering registrar
      I0625 16:07:58.895361 26229 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 16.126207ms
      I0625 16:07:58.895427 26229 replica.cpp:320] Persisted replica status to STARTING
      I0625 16:07:58.895555 26229 recover.cpp:451] Replica is in STARTING status
      I0625 16:07:58.895905 26229 replica.cpp:638] Replica in STARTING status received a broadcasted recover request
      I0625 16:07:58.895972 26229 recover.cpp:188] Received a recover response from a replica in STARTING status
      I0625 16:07:58.896081 26229 recover.cpp:542] Updating replica status to VOTING
      I0625 16:07:58.914243 26229 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 18.082424ms
      I0625 16:07:58.914566 26229 replica.cpp:320] Persisted replica status to VOTING
      I0625 16:07:58.914927 26229 recover.cpp:556] Successfully joined the Paxos group
      I0625 16:07:58.915011 26229 recover.cpp:440] Recover process terminated
      I0625 16:07:58.915163 26229 log.cpp:656] Attempting to start the writer
      I0625 16:07:58.915468 26229 replica.cpp:474] Replica received implicit promise request with proposal 1
      I0625 16:07:58.935538 26229 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 20.034561ms
      I0625 16:07:58.936040 26229 replica.cpp:342] Persisted promised to 1
      I0625 16:07:58.947181 26230 coordinator.cpp:230] Coordinator attemping to fill missing position
      I0625 16:07:58.947559 26230 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2
      I0625 16:07:58.962541 26230 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 14.946528ms
      I0625 16:07:58.962818 26230 replica.cpp:676] Persisted action at 0
      I0625 16:07:58.963426 26230 replica.cpp:508] Replica received write request for position 0
      I0625 16:07:58.963462 26230 leveldb.cpp:438] Reading position from leveldb took 19380ns
      I0625 16:07:58.982928 26230 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 19.433595ms
      I0625 16:07:58.983219 26230 replica.cpp:676] Persisted action at 0
      I0625 16:07:58.987048 26223 replica.cpp:655] Replica received learned notice for position 0
      I0625 16:07:59.000260 26223 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 13.156945ms
      I0625 16:07:59.000320 26223 replica.cpp:676] Persisted action at 0
      I0625 16:07:59.000331 26223 replica.cpp:661] Replica learned NOP action at position 0
      I0625 16:07:59.000578 26223 log.cpp:672] Writer started with ending position 0
      I0625 16:07:59.000834 26223 leveldb.cpp:438] Reading position from leveldb took 22576ns
      I0625 16:07:59.002019 26223 registrar.cpp:346] Successfully fetched the registry (0B)
      I0625 16:07:59.002043 26223 registrar.cpp:422] Attempting to update the 'registry'
      I0625 16:07:59.003206 26223 log.cpp:680] Attempting to append 118 bytes to the log
      I0625 16:07:59.003253 26223 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1
      I0625 16:07:59.003466 26223 replica.cpp:508] Replica received write request for position 1
      I0625 16:07:59.018249 26223 leveldb.cpp:343] Persisting action (135 bytes) to leveldb took 14.747648ms
      I0625 16:07:59.018712 26223 replica.cpp:676] Persisted action at 1
      I0625 16:07:59.027173 26227 replica.cpp:655] Replica received learned notice for position 1
      I0625 16:07:59.041352 26227 leveldb.cpp:343] Persisting action (137 bytes) to leveldb took 14.12408ms
      I0625 16:07:59.041412 26227 replica.cpp:676] Persisted action at 1
      I0625 16:07:59.041424 26227 replica.cpp:661] Replica learned APPEND action at position 1
      I0625 16:07:59.041815 26227 registrar.cpp:479] Successfully updated 'registry'
      I0625 16:07:59.041858 26227 registrar.cpp:372] Successfully recovered registrar
      I0625 16:07:59.041903 26227 log.cpp:699] Attempting to truncate the log to 1
      I0625 16:07:59.041981 26227 master.cpp:980] Recovered 0 slaves from the Registry (82B) ; allowing 10mins for slaves to re-register
      I0625 16:07:59.042021 26227 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2
      I0625 16:07:59.042628 26226 replica.cpp:508] Replica received write request for position 2
      I0625 16:07:59.046061 26228 slave.cpp:168] Slave started on 117)@127.0.1.1:47563
      I0625 16:07:59.046083 26228 credentials.hpp:35] Loading credentials for authentication from '/tmp/MasterTest_OrphanTasks_R5G2iM/credential'
      I0625 16:07:59.046162 26228 slave.cpp:268] Slave using credential for: test-principal
      I0625 16:07:59.046252 26228 slave.cpp:281] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
      I0625 16:07:59.046324 26228 slave.cpp:309] Slave hostname: lucid
      I0625 16:07:59.046334 26228 slave.cpp:310] Slave checkpoint: false
      I0625 16:07:59.046794 26228 state.cpp:33] Recovering state from '/tmp/MasterTest_OrphanTasks_R5G2iM/meta'
      I0625 16:07:59.046892 26228 status_update_manager.cpp:193] Recovering status update manager
      I0625 16:07:59.046967 26228 slave.cpp:3111] Finished recovery
      I0625 16:07:59.047200 26228 slave.cpp:584] New master detected at master@127.0.1.1:47563
      I0625 16:07:59.047224 26228 slave.cpp:660] Authenticating with master master@127.0.1.1:47563
      I0625 16:07:59.047263 26228 slave.cpp:633] Detecting new master
      I0625 16:07:59.047297 26228 status_update_manager.cpp:167] New master detected at master@127.0.1.1:47563
      I0625 16:07:59.047333 26228 authenticatee.hpp:128] Creating new client SASL connection
      I0625 16:07:59.047412 26228 master.cpp:3499] Authenticating slave(117)@127.0.1.1:47563
      I0625 16:07:59.047493 26228 authenticator.hpp:156] Creating new server SASL connection
      I0625 16:07:59.047550 26228 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
      I0625 16:07:59.047564 26228 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0625 16:07:59.047585 26228 authenticator.hpp:262] Received SASL authentication start
      I0625 16:07:59.047621 26228 authenticator.hpp:384] Authentication requires more steps
      I0625 16:07:59.047646 26228 authenticatee.hpp:265] Received SASL authentication step
      I0625 16:07:59.047678 26228 authenticator.hpp:290] Received SASL authentication step
      I0625 16:07:59.047693 26228 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
      I0625 16:07:59.047699 26228 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
      I0625 16:07:59.047708 26228 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I0625 16:07:59.047716 26228 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
      I0625 16:07:59.047721 26228 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I0625 16:07:59.047726 26228 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I0625 16:07:59.047736 26228 authenticator.hpp:376] Authentication success
      I0625 16:07:59.047760 26228 authenticatee.hpp:305] Authentication success
      I0625 16:07:59.047780 26228 master.cpp:3539] Successfully authenticated principal 'test-principal' at slave(117)@127.0.1.1:47563
      I0625 16:07:59.047847 26228 slave.cpp:717] Successfully authenticated with master master@127.0.1.1:47563
      I0625 16:07:59.047886 26228 slave.cpp:955] Will retry registration in 11.156008ms if necessary
      I0625 16:07:59.047976 26223 master.cpp:2781] Registering slave at slave(117)@127.0.1.1:47563 (lucid) with id 20140625-160758-16842879-47563-26203-0
      I0625 16:07:59.048074 26223 registrar.cpp:422] Attempting to update the 'registry'
      I0625 16:07:59.051329 26203 sched.cpp:139] Version: 0.20.0
      I0625 16:07:59.051686 26225 sched.cpp:235] New master detected at master@127.0.1.1:47563
      I0625 16:07:59.051718 26225 sched.cpp:285] Authenticating with master master@127.0.1.1:47563
      I0625 16:07:59.051816 26225 authenticatee.hpp:128] Creating new client SASL connection
      I0625 16:07:59.051900 26225 master.cpp:3499] Authenticating scheduler-949b4db2-5d34-4876-bff8-a706ff8e4c72@127.0.1.1:47563
      I0625 16:07:59.051978 26225 authenticator.hpp:156] Creating new server SASL connection
      I0625 16:07:59.052034 26225 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
      I0625 16:07:59.052047 26225 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0625 16:07:59.052067 26225 authenticator.hpp:262] Received SASL authentication start
      I0625 16:07:59.052099 26225 authenticator.hpp:384] Authentication requires more steps
      I0625 16:07:59.052124 26225 authenticatee.hpp:265] Received SASL authentication step
      I0625 16:07:59.052155 26225 authenticator.hpp:290] Received SASL authentication step
      I0625 16:07:59.052168 26225 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
      I0625 16:07:59.052175 26225 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
      I0625 16:07:59.052183 26225 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I0625 16:07:59.052191 26225 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
      I0625 16:07:59.052196 26225 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I0625 16:07:59.052201 26225 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I0625 16:07:59.052211 26225 authenticator.hpp:376] Authentication success
      I0625 16:07:59.052234 26225 authenticatee.hpp:305] Authentication success
      I0625 16:07:59.052255 26225 master.cpp:3539] Successfully authenticated principal 'test-principal' at scheduler-949b4db2-5d34-4876-bff8-a706ff8e4c72@127.0.1.1:47563
      I0625 16:07:59.052322 26225 sched.cpp:359] Successfully authenticated with master master@127.0.1.1:47563
      I0625 16:07:59.052335 26225 sched.cpp:478] Sending registration request to master@127.0.1.1:47563
      I0625 16:07:59.052376 26225 master.cpp:1241] Received registration request from scheduler-949b4db2-5d34-4876-bff8-a706ff8e4c72@127.0.1.1:47563
      I0625 16:07:59.052393 26225 master.cpp:1201] Authorizing framework principal 'test-principal' to receive offers for role '*'
      I0625 16:07:59.052490 26225 master.cpp:1300] Registering framework 20140625-160758-16842879-47563-26203-0000 at scheduler-949b4db2-5d34-4876-bff8-a706ff8e4c72@127.0.1.1:47563
      I0625 16:07:59.052580 26225 sched.cpp:409] Framework registered with 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.052602 26225 sched.cpp:423] Scheduler::registered took 8220ns
      I0625 16:07:59.052634 26225 hierarchical_allocator_process.hpp:331] Added framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.052642 26225 hierarchical_allocator_process.hpp:724] No resources available to allocate!
      I0625 16:07:59.052647 26225 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 5845ns
      I0625 16:07:59.061020 26226 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 18.344559ms
      I0625 16:07:59.061079 26226 replica.cpp:676] Persisted action at 2
      I0625 16:07:59.061353 26226 replica.cpp:655] Replica received learned notice for position 2
      I0625 16:07:59.066944 26223 slave.cpp:955] Will retry registration in 32.268831ms if necessary
      I0625 16:07:59.067044 26223 master.cpp:2769] Ignoring register slave message from slave(117)@127.0.1.1:47563 (lucid) as admission is already in progress
      I0625 16:07:59.077322 26226 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 15.931404ms
      I0625 16:07:59.077587 26226 leveldb.cpp:401] Deleting ~1 keys from leveldb took 51614ns
      I0625 16:07:59.077605 26226 replica.cpp:676] Persisted action at 2
      I0625 16:07:59.077615 26226 replica.cpp:661] Replica learned TRUNCATE action at position 2
      I0625 16:07:59.078006 26226 log.cpp:680] Attempting to append 289 bytes to the log
      I0625 16:07:59.078053 26226 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3
      I0625 16:07:59.078299 26226 replica.cpp:508] Replica received write request for position 3
      I0625 16:07:59.094300 26226 leveldb.cpp:343] Persisting action (308 bytes) to leveldb took 15.951731ms
      I0625 16:07:59.094590 26226 replica.cpp:676] Persisted action at 3
      I0625 16:07:59.107091 26227 replica.cpp:655] Replica received learned notice for position 3
      I0625 16:07:59.107347 26224 slave.cpp:955] Will retry registration in 75.040916ms if necessary
      I0625 16:07:59.107457 26224 master.cpp:2769] Ignoring register slave message from slave(117)@127.0.1.1:47563 (lucid) as admission is already in progress
      I0625 16:07:59.118427 26227 leveldb.cpp:343] Persisting action (310 bytes) to leveldb took 11.273294ms
      I0625 16:07:59.118489 26227 replica.cpp:676] Persisted action at 3
      I0625 16:07:59.118499 26227 replica.cpp:661] Replica learned APPEND action at position 3
      I0625 16:07:59.118921 26227 registrar.cpp:479] Successfully updated 'registry'
      I0625 16:07:59.119007 26227 log.cpp:699] Attempting to truncate the log to 3
      I0625 16:07:59.119071 26227 master.cpp:2821] Registered slave 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid)
      I0625 16:07:59.119088 26227 master.cpp:3967] Adding slave 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
      I0625 16:07:59.119207 26227 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4
      I0625 16:07:59.119288 26227 slave.cpp:751] Registered with master master@127.0.1.1:47563; given slave ID 20140625-160758-16842879-47563-26203-0
      I0625 16:07:59.119355 26227 hierarchical_allocator_process.hpp:444] Added slave 20140625-160758-16842879-47563-26203-0 (lucid) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
      I0625 16:07:59.119410 26227 hierarchical_allocator_process.hpp:750] Offering cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140625-160758-16842879-47563-26203-0 to framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.119496 26227 hierarchical_allocator_process.hpp:706] Performed allocation for slave 20140625-160758-16842879-47563-26203-0 in 110886ns
      I0625 16:07:59.119549 26227 slave.cpp:2308] Received ping from slave-observer(97)@127.0.1.1:47563
      I0625 16:07:59.119604 26227 master.hpp:794] Adding offer 20140625-160758-16842879-47563-26203-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140625-160758-16842879-47563-26203-0 (lucid)
      I0625 16:07:59.119644 26227 master.cpp:3446] Sending 1 offers to framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.119900 26227 sched.cpp:546] Scheduler::resourceOffers took 125487ns
      I0625 16:07:59.120072 26227 master.hpp:804] Removing offer 20140625-160758-16842879-47563-26203-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140625-160758-16842879-47563-26203-0 (lucid)
      I0625 16:07:59.120121 26227 master.cpp:2125] Processing reply for offers: [ 20140625-160758-16842879-47563-26203-0 ] on slave 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid) for framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.120141 26227 master.cpp:2211] Authorizing framework principal 'test-principal' to launch task 0 as user 'jenkins'
      I0625 16:07:59.120472 26227 master.hpp:766] Adding task 0 with resources cpus(*):1; mem(*):64 on slave 20140625-160758-16842879-47563-26203-0 (lucid)
      I0625 16:07:59.120501 26227 master.cpp:2277] Launching task 0 of framework 20140625-160758-16842879-47563-26203-0000 with resources cpus(*):1; mem(*):64 on slave 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid)
      I0625 16:07:59.120621 26227 replica.cpp:508] Replica received write request for position 4
      I0625 16:07:59.121064 26224 slave.cpp:986] Got assigned task 0 for framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.121354 26224 slave.cpp:1096] Launching task 0 for framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.124053 26224 exec.cpp:131] Version: 0.20.0
      I0625 16:07:59.124178 26224 slave.cpp:1206] Queuing task '0' for executor default of framework '20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.124223 26224 slave.cpp:2453] Monitoring executor 'default' of framework '20140625-160758-16842879-47563-26203-0000' in container '42b56b06-5a72-41bb-8137-67926c47c9e9'
      I0625 16:07:59.124274 26224 hierarchical_allocator_process.hpp:546] Framework 20140625-160758-16842879-47563-26203-0000 left cpus(*):1; mem(*):960; disk(*):1024; ports(*):[31000-32000] unused on slave 20140625-160758-16842879-47563-26203-0
      I0625 16:07:59.124358 26224 hierarchical_allocator_process.hpp:588] Framework 20140625-160758-16842879-47563-26203-0000 filtered slave 20140625-160758-16842879-47563-26203-0 for 5secs
      I0625 16:07:59.124452 26224 exec.cpp:181] Executor started at: executor(36)@127.0.1.1:47563 with pid 26203
      I0625 16:07:59.124528 26224 slave.cpp:545] Successfully attached file '/tmp/MasterTest_OrphanTasks_R5G2iM/slaves/20140625-160758-16842879-47563-26203-0/frameworks/20140625-160758-16842879-47563-26203-0000/executors/default/runs/42b56b06-5a72-41bb-8137-67926c47c9e9'
      I0625 16:07:59.124557 26224 slave.cpp:1717] Got registration for executor 'default' of framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.124634 26224 slave.cpp:1836] Flushing queued task 0 for executor 'default' of framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.124703 26224 exec.cpp:205] Executor registered on slave 20140625-160758-16842879-47563-26203-0
      I0625 16:07:59.125650 26224 exec.cpp:217] Executor::registered took 15164ns
      I0625 16:07:59.125702 26224 exec.cpp:292] Executor asked to run task '0'
      I0625 16:07:59.125732 26224 exec.cpp:301] Executor::launchTask took 21us
      I0625 16:07:59.127326 26224 exec.cpp:524] Executor sending status update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.127782 26223 slave.cpp:2071] Handling status update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000 from executor(36)@127.0.1.1:47563
      I0625 16:07:59.127879 26223 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.127893 26223 status_update_manager.cpp:499] Creating StatusUpdate stream for task 0 of framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.127962 26223 status_update_manager.cpp:373] Forwarding status update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000 to master@127.0.1.1:47563
      I0625 16:07:59.128090 26223 master.cpp:3107] Status update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000 from slave 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid)
      I0625 16:07:59.128124 26223 slave.cpp:2229] Status update manager successfully handled status update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.128134 26223 slave.cpp:2235] Sending acknowledgement for status update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000 to executor(36)@127.0.1.1:47563
      I0625 16:07:59.128206 26223 sched.cpp:637] Scheduler::statusUpdate took 18758ns
      I0625 16:07:59.128267 26223 master.cpp:2631] Forwarding status update acknowledgement 77f58dcb-2ae3-45d2-a071-2c808614ab3a for task 0 of framework 20140625-160758-16842879-47563-26203-0000 to slave 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid)
      I0625 16:07:59.128363 26223 status_update_manager.cpp:398] Received status update acknowledgement (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.128412 26223 slave.cpp:1657] Status update manager successfully handled status update acknowledgement (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.129055 26226 process.cpp:3335] Handling HTTP event for process 'master' with path: '/master/state.json'
      I0625 16:07:59.129106 26226 http.cpp:452] HTTP request for '/master/state.json'
      I0625 16:07:59.129865 26231 process.cpp:1037] Socket closed while receiving
      I0625 16:07:59.130645 26203 master.cpp:619] Master terminating
      I0625 16:07:59.130695 26203 master.hpp:784] Removing task 0 with resources cpus(*):1; mem(*):64 on slave 20140625-160758-16842879-47563-26203-0 (lucid)
      W0625 16:07:59.130738 26203 master.cpp:4253] Removing task 0 of framework 20140625-160758-16842879-47563-26203-0000 and slave 20140625-160758-16842879-47563-26203-0 in non-terminal state TASK_RUNNING
      I0625 16:07:59.131604 26226 slave.cpp:2315] master@127.0.1.1:47563 exited
      W0625 16:07:59.131628 26226 slave.cpp:2318] Master disconnected! Waiting for a new master to be elected
      I0625 16:07:59.132665 26224 exec.cpp:338] Executor received status update acknowledgement 77f58dcb-2ae3-45d2-a071-2c808614ab3a for task 0 of framework 20140625-160758-16842879-47563-26203-0000
      I0625 16:07:59.138859 26227 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 18.191907ms
      I0625 16:07:59.138913 26227 replica.cpp:676] Persisted action at 4
      I0625 16:07:59.190522 26203 leveldb.cpp:176] Opened db in 42.905633ms
      I0625 16:07:59.229253 26203 leveldb.cpp:183] Compacted db in 38.417135ms
      I0625 16:07:59.229497 26203 leveldb.cpp:198] Created db iterator in 9638ns
      I0625 16:07:59.229696 26203 leveldb.cpp:204] Seeked to beginning of db in 19529ns
      I0625 16:07:59.229907 26203 leveldb.cpp:273] Iterated through 5 keys in the db in 46457ns
      I0625 16:07:59.230082 26203 replica.cpp:741] Replica recovered with log positions 1 -> 4 with 0 holes and 1 unlearned
      I0625 16:07:59.230548 26226 recover.cpp:425] Starting replica recovery
      I0625 16:07:59.230633 26226 recover.cpp:451] Replica is in VOTING status
      I0625 16:07:59.230676 26226 recover.cpp:440] Recover process terminated
      I0625 16:07:59.234074 26224 master.cpp:288] Master 20140625-160759-16842879-47563-26203 (lucid) started on 127.0.1.1:47563
      I0625 16:07:59.234104 26224 master.cpp:325] Master only allowing authenticated frameworks to register
      I0625 16:07:59.234110 26224 master.cpp:330] Master only allowing authenticated slaves to register
      I0625 16:07:59.234118 26224 credentials.hpp:35] Loading credentials for authentication from '/tmp/MasterTest_OrphanTasks_s8r6NU/credentials'
      I0625 16:07:59.234195 26224 master.cpp:356] Authorization enabled
      I0625 16:07:59.234711 26224 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@127.0.1.1:47563
      I0625 16:07:59.234755 26224 master.cpp:122] No whitelist given. Advertising offers for all slaves
      I0625 16:07:59.234928 26224 master.cpp:1122] The newly elected leader is master@127.0.1.1:47563 with id 20140625-160759-16842879-47563-26203
      I0625 16:07:59.234941 26224 master.cpp:1135] Elected as the leading master!
      I0625 16:07:59.234948 26224 master.cpp:953] Recovering from registrar
      I0625 16:07:59.235004 26224 registrar.cpp:313] Recovering registrar
      I0625 16:07:59.235229 26224 log.cpp:656] Attempting to start the writer
      I0625 16:07:59.235543 26224 replica.cpp:474] Replica received implicit promise request with proposal 2
      I0625 16:07:59.246944 26224 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 11.362596ms
      I0625 16:07:59.247259 26224 replica.cpp:342] Persisted promised to 2
      I0625 16:07:59.257194 26230 coordinator.cpp:230] Coordinator attemping to fill missing position
      I0625 16:07:59.257627 26230 replica.cpp:375] Replica received explicit promise request for position 4 with proposal 3
      I0625 16:07:59.257693 26230 leveldb.cpp:438] Reading position from leveldb took 45817ns
      I0625 16:07:59.270833 26230 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 13.105094ms
      I0625 16:07:59.271126 26230 replica.cpp:676] Persisted action at 4
      I0625 16:07:59.271867 26230 replica.cpp:508] Replica received write request for position 4
      I0625 16:07:59.272071 26230 leveldb.cpp:438] Reading position from leveldb took 24316ns
      I0625 16:07:59.283974 26230 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 11.714283ms
      I0625 16:07:59.284390 26230 replica.cpp:676] Persisted action at 4
      I0625 16:07:59.287039 26226 replica.cpp:655] Replica received learned notice for position 4
      I0625 16:07:59.297823 26226 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 10.735977ms
      I0625 16:07:59.297930 26226 leveldb.cpp:401] Deleting ~2 keys from leveldb took 49963ns
      I0625 16:07:59.297947 26226 replica.cpp:676] Persisted action at 4
      I0625 16:07:59.297956 26226 replica.cpp:661] Replica learned TRUNCATE action at position 4
      I0625 16:07:59.298205 26226 log.cpp:672] Writer started with ending position 4
      I0625 16:07:59.298506 26226 leveldb.cpp:438] Reading position from leveldb took 37502ns
      I0625 16:07:59.298533 26226 leveldb.cpp:438] Reading position from leveldb took 6627ns
      I0625 16:07:59.298846 26226 registrar.cpp:346] Successfully fetched the registry (250B)
      I0625 16:07:59.298873 26226 registrar.cpp:422] Attempting to update the 'registry'
      I0625 16:07:59.300153 26226 log.cpp:680] Attempting to append 289 bytes to the log
      I0625 16:07:59.300201 26226 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 5
      I0625 16:07:59.300427 26226 replica.cpp:508] Replica received write request for position 5
      I0625 16:07:59.313813 26226 leveldb.cpp:343] Persisting action (308 bytes) to leveldb took 13.351282ms
      I0625 16:07:59.314107 26226 replica.cpp:676] Persisted action at 5
      I0625 16:07:59.327086 26224 replica.cpp:655] Replica received learned notice for position 5
      I0625 16:07:59.338047 26224 leveldb.cpp:343] Persisting action (310 bytes) to leveldb took 10.688745ms
      I0625 16:07:59.338348 26224 replica.cpp:676] Persisted action at 5
      I0625 16:07:59.338714 26224 replica.cpp:661] Replica learned APPEND action at position 5
      I0625 16:07:59.339380 26224 registrar.cpp:479] Successfully updated 'registry'
      I0625 16:07:59.339436 26223 log.cpp:699] Attempting to truncate the log to 5
      I0625 16:07:59.339745 26223 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 6
      I0625 16:07:59.339993 26223 replica.cpp:508] Replica received write request for position 6
      I0625 16:07:59.340585 26224 registrar.cpp:372] Successfully recovered registrar
      I0625 16:07:59.341030 26228 master.cpp:980] Recovered 1 slaves from the Registry (250B) ; allowing 10mins for slaves to re-register
      I0625 16:07:59.342396 26225 slave.cpp:584] New master detected at master@127.0.1.1:47563
      I0625 16:07:59.342427 26225 slave.cpp:660] Authenticating with master master@127.0.1.1:47563
      I0625 16:07:59.342473 26225 slave.cpp:633] Detecting new master
      I0625 16:07:59.342536 26225 sched.cpp:229] Scheduler::disconnected took 8596ns
      I0625 16:07:59.342546 26225 sched.cpp:235] New master detected at master@127.0.1.1:47563
      I0625 16:07:59.342557 26225 sched.cpp:285] Authenticating with master master@127.0.1.1:47563
      I0625 16:07:59.342622 26225 status_update_manager.cpp:167] New master detected at master@127.0.1.1:47563
      I0625 16:07:59.342663 26225 authenticatee.hpp:128] Creating new client SASL connection
      I0625 16:07:59.342738 26225 authenticatee.hpp:128] Creating new client SASL connection
      I0625 16:07:59.342778 26225 master.cpp:843] Dropping 'mesos.internal.AuthenticateMessage' message since not recovered yet
      I0625 16:07:59.342794 26225 master.cpp:843] Dropping 'mesos.internal.AuthenticateMessage' message since not recovered yet
      I0625 16:07:59.353075 26223 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 13.023984ms
      I0625 16:07:59.353137 26223 replica.cpp:676] Persisted action at 6
      I0625 16:07:59.353423 26223 replica.cpp:655] Replica received learned notice for position 6
      I0625 16:07:59.364477 26223 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 11.017392ms
      I0625 16:07:59.364581 26223 leveldb.cpp:401] Deleting ~2 keys from leveldb took 48053ns
      I0625 16:07:59.364598 26223 replica.cpp:676] Persisted action at 6
      I0625 16:07:59.364608 26223 replica.cpp:661] Replica learned TRUNCATE action at position 6
      2014-06-25 16:08:00,021:26203(0x2b44fb781700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53090] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
      2014-06-25 16:08:03,358:26203(0x2b44fb781700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53090] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
      2014-06-25 16:08:06,695:26203(0x2b44fb781700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:53090] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
      tests/master_tests.cpp:1936: Failure
      Failed to wait 10secs for slaveReregisteredMessage
      I0625 16:08:09.357589 26203 master.cpp:619] Master terminating
      F0625 16:08:09.357723 26226 slave.cpp:571] Check failed: !_master.isDiscarded() 
      *** Check failure stack trace: ***
          @     0x2b44e959da2d  google::LogMessage::Fail()
          @     0x2b44e95a292f  google::LogMessage::SendToLog()
          @     0x2b44e959f90a  google::LogMessage::Flush()
          @     0x2b44e959fd1d  google::LogMessageFatal::~LogMessageFatal()
          @     0x2b44e90d82f0  mesos::internal::slave::Slave::detected()
          @           0x4dca91  std::tr1::_Function_handler<>::_M_invoke()
          @     0x2b44e94b42da  process::ProcessManager::resume()
          @     0x2b44e94b4bec  process::schedule()
          @     0x2b44eb4ba9ca  start_thread
          @     0x2b44eb7bb1cd  (unknown)
      make[3]: *** [check-local] Aborted
      

      Attachments

        Activity

          People

            Yifan Yifan Gu
            vinodkone Vinod Kone
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: