Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1273

SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch is flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.19.0
    • test
    • None

    Description

      [ RUN ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch
      I0430 22:06:03.219058 18018 leveldb.cpp:174] Opened db in 406.242037ms
      I0430 22:06:03.232270 18018 leveldb.cpp:181] Compacted db in 13.130925ms
      I0430 22:06:03.232339 18018 leveldb.cpp:196] Created db iterator in 6657ns
      I0430 22:06:03.232352 18018 leveldb.cpp:202] Seeked to beginning of db in 1563ns
      I0430 22:06:03.232367 18018 leveldb.cpp:271] Iterated through 0 keys in the db in 351ns
      I0430 22:06:03.232390 18018 replica.cpp:729] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
      I0430 22:06:03.233069 18167 recover.cpp:425] Starting replica recovery
      I0430 22:06:03.233415 18166 recover.cpp:451] Replica is in EMPTY status
      I0430 22:06:03.233432 18165 master.cpp:266] Master 20140430-220603-1032504131-52571-18018 (juno.apache.org) started on 67.195.138.61:52571
      I0430 22:06:03.233453 18165 master.cpp:303] Master only allowing authenticated frameworks to register
      I0430 22:06:03.233458 18165 master.cpp:308] Master only allowing authenticated slaves to register
      I0430 22:06:03.233464 18165 credentials.hpp:35] Loading credentials for authentication
      W0430 22:06:03.233510 18165 credentials.hpp:48] Failed to stat credentials file 'file:///tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_FKd8m3/credentials': No such file or directory
      I0430 22:06:03.234156 18169 master.cpp:104] No whitelist given. Advertising offers for all slaves
      I0430 22:06:03.234200 18169 replica.cpp:626] Replica in EMPTY status received a broadcasted recover request
      I0430 22:06:03.234457 18165 recover.cpp:188] Received a recover response from a replica in EMPTY status
      I0430 22:06:03.234462 18166 hierarchical_allocator_process.hpp:302] Initializing hierarchical allocator process with master : master@67.195.138.61:52571
      I0430 22:06:03.234587 18167 master.cpp:922] The newly elected leader is master@67.195.138.61:52571 with id 20140430-220603-1032504131-52571-18018
      I0430 22:06:03.234599 18167 master.cpp:932] Elected as the leading master!
      I0430 22:06:03.234606 18167 master.cpp:753] Recovering from registrar
      I0430 22:06:03.234683 18167 registrar.cpp:275] Recovering registrar
      I0430 22:06:03.234959 18165 recover.cpp:542] Updating replica status to STARTING
      I0430 22:06:03.307873 18165 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 72.821123ms
      I0430 22:06:03.307939 18165 replica.cpp:320] Persisted replica status to STARTING
      I0430 22:06:03.308082 18165 recover.cpp:451] Replica is in STARTING status
      I0430 22:06:03.308593 18169 replica.cpp:626] Replica in STARTING status received a broadcasted recover request
      I0430 22:06:03.308756 18166 recover.cpp:188] Received a recover response from a replica in STARTING status
      I0430 22:06:03.308938 18165 recover.cpp:542] Updating replica status to VOTING
      I0430 22:06:03.318532 18172 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 9.307652ms
      I0430 22:06:03.318590 18172 replica.cpp:320] Persisted replica status to VOTING
      I0430 22:06:03.318701 18165 recover.cpp:556] Successfully joined the Paxos group
      I0430 22:06:03.318830 18165 recover.cpp:440] Recover process terminated
      I0430 22:06:03.319005 18165 log.cpp:656] Attempting to start the writer
      I0430 22:06:03.319419 18165 replica.cpp:474] Replica received implicit promise request with proposal 1
      I0430 22:06:03.326691 18165 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 7.236448ms
      I0430 22:06:03.326768 18165 replica.cpp:342] Persisted promised to 1
      I0430 22:06:03.327123 18165 coordinator.cpp:229] Coordinator attemping to fill missing position
      I0430 22:06:03.327606 18165 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2
      I0430 22:06:03.335028 18165 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 7.394042ms
      I0430 22:06:03.335083 18165 replica.cpp:664] Persisted action at 0
      I0430 22:06:03.335489 18165 replica.cpp:508] Replica received write request for position 0
      I0430 22:06:03.335522 18165 leveldb.cpp:436] Reading position from leveldb took 17556ns
      I0430 22:06:03.343369 18165 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 7.808785ms
      I0430 22:06:03.343457 18165 replica.cpp:664] Persisted action at 0
      I0430 22:06:03.343780 18165 replica.cpp:643] Replica received learned notice for position 0
      I0430 22:06:03.351706 18165 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 7.885058ms
      I0430 22:06:03.351765 18165 replica.cpp:664] Persisted action at 0
      I0430 22:06:03.351779 18165 replica.cpp:649] Replica learned NOP action at position 0
      I0430 22:06:03.352074 18165 log.cpp:672] Writer started with ending position 0
      I0430 22:06:03.352421 18165 leveldb.cpp:436] Reading position from leveldb took 21977ns
      I0430 22:06:03.353500 18165 registrar.cpp:306] Successfully recovered registrar
      I0430 22:06:03.353528 18165 registrar.cpp:377] Attempting to update the 'registry'
      I0430 22:06:03.354563 18165 log.cpp:680] Attempting to append 137 bytes to the log
      I0430 22:06:03.354615 18165 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 1
      I0430 22:06:03.354886 18165 replica.cpp:508] Replica received write request for position 1
      I0430 22:06:03.365105 18165 leveldb.cpp:341] Persisting action (156 bytes) to leveldb took 10.187676ms
      I0430 22:06:03.365169 18165 replica.cpp:664] Persisted action at 1
      I0430 22:06:03.365488 18165 replica.cpp:643] Replica received learned notice for position 1
      I0430 22:06:03.373432 18165 leveldb.cpp:341] Persisting action (158 bytes) to leveldb took 7.910739ms
      I0430 22:06:03.373493 18165 replica.cpp:664] Persisted action at 1
      I0430 22:06:03.373507 18165 replica.cpp:649] Replica learned APPEND action at position 1
      I0430 22:06:03.373975 18165 registrar.cpp:423] Successfully updated 'registry'
      I0430 22:06:03.374055 18165 log.cpp:699] Attempting to truncate the log to 1
      I0430 22:06:03.374137 18165 master.cpp:780] Recovered 0 slaves from the Registry (99B) ; allowing 10mins for slaves to re-register
      I0430 22:06:03.374181 18165 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 2
      I0430 22:06:03.374495 18165 replica.cpp:508] Replica received write request for position 2
      I0430 22:06:03.393326 18165 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 18.798159ms
      I0430 22:06:03.393404 18165 replica.cpp:664] Persisted action at 2
      I0430 22:06:03.393772 18165 replica.cpp:643] Replica received learned notice for position 2
      I0430 22:06:03.406213 18169 slave.cpp:140] Slave started on 76)@67.195.138.61:52571
      I0430 22:06:03.406260 18169 credentials.hpp:35] Loading credentials for authentication
      W0430 22:06:03.406378 18169 credentials.hpp:48] Failed to stat credentials file 'file:///tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/credential': No such file or directory
      I0430 22:06:03.406411 18169 slave.cpp:231] Slave using credential for: test-principal
      I0430 22:06:03.406564 18169 slave.cpp:244] Slave resources: cpus:2; mem:1024; disk:1024; ports:[31000-32000]
      I0430 22:06:03.406651 18169 slave.cpp:272] Slave hostname: juno.apache.org
      I0430 22:06:03.406664 18169 slave.cpp:273] Slave checkpoint: true
      I0430 22:06:03.407410 18169 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/meta'
      I0430 22:06:03.407572 18172 status_update_manager.cpp:193] Recovering status update manager
      I0430 22:06:03.407569 18165 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 13.773564ms
      I0430 22:06:03.407634 18165 leveldb.cpp:399] Deleting ~1 keys from leveldb took 33503ns
      I0430 22:06:03.407647 18165 replica.cpp:664] Persisted action at 2
      I0430 22:06:03.407660 18165 replica.cpp:649] Replica learned TRUNCATE action at position 2
      I0430 22:06:03.407707 18172 slave.cpp:2943] Finished recovery
      I0430 22:06:03.408071 18165 slave.cpp:525] New master detected at master@67.195.138.61:52571
      I0430 22:06:03.408095 18165 slave.cpp:585] Authenticating with master master@67.195.138.61:52571
      I0430 22:06:03.408143 18165 slave.cpp:558] Detecting new master
      I0430 22:06:03.408175 18165 status_update_manager.cpp:167] New master detected at master@67.195.138.61:52571
      I0430 22:06:03.408227 18172 authenticatee.hpp:128] Creating new client SASL connection
      I0430 22:06:03.408431 18172 master.cpp:2795] Authenticating slave(76)@67.195.138.61:52571
      I0430 22:06:03.408514 18172 authenticator.hpp:148] Creating new server SASL connection
      I0430 22:06:03.408579 18018 sched.cpp:121] Version: 0.19.0
      I0430 22:06:03.408622 18165 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
      I0430 22:06:03.408639 18165 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0430 22:06:03.408757 18165 authenticator.hpp:254] Received SASL authentication start
      I0430 22:06:03.408807 18165 authenticator.hpp:342] Authentication requires more steps
      I0430 22:06:03.408835 18165 authenticatee.hpp:265] Received SASL authentication step
      I0430 22:06:03.408872 18165 authenticator.hpp:282] Received SASL authentication step
      I0430 22:06:03.408891 18165 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'juno.apache.org' server FQDN: 'juno.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
      I0430 22:06:03.408900 18165 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
      I0430 22:06:03.408911 18165 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I0430 22:06:03.408921 18165 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'juno.apache.org' server FQDN: 'juno.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
      I0430 22:06:03.408926 18165 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I0430 22:06:03.408931 18165 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I0430 22:06:03.408943 18165 authenticator.hpp:334] Authentication success
      I0430 22:06:03.408968 18165 authenticatee.hpp:305] Authentication success
      I0430 22:06:03.408988 18165 master.cpp:2835] Successfully authenticated slave(76)@67.195.138.61:52571
      I0430 22:06:03.409032 18172 slave.cpp:642] Successfully authenticated with master master@67.195.138.61:52571
      I0430 22:06:03.409073 18172 slave.cpp:871] Will retry registration in 6.568752295secs if necessary
      I0430 22:06:03.409102 18167 sched.cpp:217] New master detected at master@67.195.138.61:52571
      I0430 22:06:03.409114 18167 sched.cpp:268] Authenticating with master master@67.195.138.61:52571
      I0430 22:06:03.409179 18172 authenticatee.hpp:128] Creating new client SASL connection
      I0430 22:06:03.409374 18165 master.cpp:2795] Authenticating scheduler(56)@67.195.138.61:52571
      I0430 22:06:03.409471 18165 registrar.cpp:377] Attempting to update the 'registry'
      I0430 22:06:03.410437 18165 authenticator.hpp:148] Creating new server SASL connection
      I0430 22:06:03.410662 18165 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
      I0430 22:06:03.410676 18165 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0430 22:06:03.410696 18165 log.cpp:680] Attempting to append 328 bytes to the log
      I0430 22:06:03.410739 18165 authenticator.hpp:254] Received SASL authentication start
      I0430 22:06:03.410763 18165 authenticator.hpp:342] Authentication requires more steps
      I0430 22:06:03.410784 18165 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 3
      I0430 22:06:03.410857 18165 authenticatee.hpp:265] Received SASL authentication step
      I0430 22:06:03.410914 18165 authenticator.hpp:282] Received SASL authentication step
      I0430 22:06:03.410929 18165 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'juno.apache.org' server FQDN: 'juno.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
      I0430 22:06:03.410936 18165 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
      I0430 22:06:03.410944 18165 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I0430 22:06:03.410953 18165 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'juno.apache.org' server FQDN: 'juno.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
      I0430 22:06:03.410959 18165 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I0430 22:06:03.410964 18165 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I0430 22:06:03.410972 18165 authenticator.hpp:334] Authentication success
      I0430 22:06:03.411017 18165 authenticatee.hpp:305] Authentication success
      I0430 22:06:03.411036 18165 master.cpp:2835] Successfully authenticated scheduler(56)@67.195.138.61:52571
      I0430 22:06:03.411123 18165 sched.cpp:342] Successfully authenticated with master master@67.195.138.61:52571
      I0430 22:06:03.411134 18165 sched.cpp:461] Sending registration request to master@67.195.138.61:52571
      I0430 22:06:03.411216 18165 master.cpp:981] Received registration request from scheduler(56)@67.195.138.61:52571
      I0430 22:06:03.411242 18165 master.cpp:999] Registering framework 20140430-220603-1032504131-52571-18018-0000 at scheduler(56)@67.195.138.61:52571
      I0430 22:06:03.411341 18165 sched.cpp:392] Framework registered with 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.411360 18165 sched.cpp:406] Scheduler::registered took 9804ns
      I0430 22:06:03.411393 18165 hierarchical_allocator_process.hpp:332] Added framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.411401 18165 hierarchical_allocator_process.hpp:726] No resources available to allocate!
      I0430 22:06:03.411407 18165 hierarchical_allocator_process.hpp:688] Performed allocation for 0 slaves in 6411ns
      I0430 22:06:03.411429 18165 replica.cpp:508] Replica received write request for position 3
      I0430 22:06:03.418311 18165 leveldb.cpp:341] Persisting action (347 bytes) to leveldb took 6.855862ms
      I0430 22:06:03.418380 18165 replica.cpp:664] Persisted action at 3
      I0430 22:06:03.418699 18165 replica.cpp:643] Replica received learned notice for position 3
      I0430 22:06:03.426651 18165 leveldb.cpp:341] Persisting action (349 bytes) to leveldb took 7.921913ms
      I0430 22:06:03.426723 18165 replica.cpp:664] Persisted action at 3
      I0430 22:06:03.426735 18165 replica.cpp:649] Replica learned APPEND action at position 3
      I0430 22:06:03.427196 18165 registrar.cpp:423] Successfully updated 'registry'
      I0430 22:06:03.427320 18172 log.cpp:699] Attempting to truncate the log to 3
      I0430 22:06:03.427362 18172 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 4
      I0430 22:06:03.427639 18172 replica.cpp:508] Replica received write request for position 4
      I0430 22:06:03.427811 18165 master.cpp:2169] Admitted slave on juno.apache.org at slave(76)@67.195.138.61:52571
      I0430 22:06:03.427827 18165 master.cpp:3283] Adding slave 20140430-220603-1032504131-52571-18018-0 at juno.apache.org with cpus:2; mem:1024; disk:1024; ports:[31000-32000]
      I0430 22:06:03.427942 18165 slave.cpp:675] Registered with master master@67.195.138.61:52571; given slave ID 20140430-220603-1032504131-52571-18018-0
      I0430 22:06:03.428136 18165 slave.cpp:688] Checkpointing SlaveInfo to '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/meta/slaves/20140430-220603-1032504131-52571-18018-0/slave.info'
      I0430 22:06:03.428302 18165 hierarchical_allocator_process.hpp:445] Added slave 20140430-220603-1032504131-52571-18018-0 (juno.apache.org) with cpus:2; mem:1024; disk:1024; ports:[31000-32000] (and cpus:2; mem:1024; disk:1024; ports:[31000-32000] available)
      I0430 22:06:03.428377 18165 hierarchical_allocator_process.hpp:752] Offering cpus:2; mem:1024; disk:1024; ports:[31000-32000] on slave 20140430-220603-1032504131-52571-18018-0 to framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.428491 18165 hierarchical_allocator_process.hpp:708] Performed allocation for slave 20140430-220603-1032504131-52571-18018-0 in 159129ns
      I0430 22:06:03.428561 18165 master.hpp:586] Adding offer 20140430-220603-1032504131-52571-18018-0 with resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] on slave 20140430-220603-1032504131-52571-18018-0 (juno.apache.org)
      I0430 22:06:03.428606 18165 master.cpp:2744] Sending 1 offers to framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.428735 18165 sched.cpp:529] Scheduler::resourceOffers took 21533ns
      I0430 22:06:03.429956 18169 master.hpp:596] Removing offer 20140430-220603-1032504131-52571-18018-0 with resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] on slave 20140430-220603-1032504131-52571-18018-0 (juno.apache.org)
      I0430 22:06:03.430007 18169 master.cpp:1806] Processing reply for offers: [ 20140430-220603-1032504131-52571-18018-0 ] on slave 20140430-220603-1032504131-52571-18018-0 (juno.apache.org) for framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.430095 18169 master.hpp:558] Adding task f8c08084-068f-4d26-81bd-3a8d8efea7f5 with resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] on slave 20140430-220603-1032504131-52571-18018-0 (juno.apache.org)
      I0430 22:06:03.430146 18169 master.cpp:2919] Launching task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000 with resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] on slave 20140430-220603-1032504131-52571-18018-0 (juno.apache.org)
      I0430 22:06:03.430274 18169 slave.cpp:905] Got assigned task f8c08084-068f-4d26-81bd-3a8d8efea7f5 for framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.430343 18169 slave.cpp:3082] Checkpointing FrameworkInfo to '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/meta/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000/framework.info'
      I0430 22:06:03.430466 18169 slave.cpp:3089] Checkpointing framework pid 'scheduler(56)@67.195.138.61:52571' to '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/meta/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000/framework.pid'
      I0430 22:06:03.430691 18169 slave.cpp:1015] Launching task f8c08084-068f-4d26-81bd-3a8d8efea7f5 for framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.431721 18169 slave.cpp:3396] Checkpointing ExecutorInfo to '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/meta/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000/executors/f8c08084-068f-4d26-81bd-3a8d8efea7f5/executor.info'
      I0430 22:06:03.432059 18169 slave.cpp:3493] Checkpointing TaskInfo to '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/meta/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000/executors/f8c08084-068f-4d26-81bd-3a8d8efea7f5/runs/4938e23d-ede6-4fdf-811d-aa7bfcc8490b/tasks/f8c08084-068f-4d26-81bd-3a8d8efea7f5/task.info'
      I0430 22:06:03.432195 18169 slave.cpp:1125] Queuing task 'f8c08084-068f-4d26-81bd-3a8d8efea7f5' for executor f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework '20140430-220603-1032504131-52571-18018-0000
      E0430 22:06:03.432241 18169 slave.cpp:2324] Termination of executor 'f8c08084-068f-4d26-81bd-3a8d8efea7f5' of framework '20140430-220603-1032504131-52571-18018-0000' failed: Unknown container: 4938e23d-ede6-4fdf-811d-aa7bfcc8490b
      E0430 22:06:03.432623 18168 slave.cpp:2613] Failed to unmonitor container for executor f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000: Not monitored
      I0430 22:06:03.433462 18169 slave.cpp:1953] Handling status update TASK_FAILED (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000 from @0.0.0.0:0
      I0430 22:06:03.433483 18169 slave.cpp:3444] Terminating task f8c08084-068f-4d26-81bd-3a8d8efea7f5
      I0430 22:06:03.433579 18169 slave.cpp:414] Slave terminating
      I0430 22:06:03.433617 18169 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.433630 18169 status_update_manager.cpp:499] Creating StatusUpdate stream for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.433740 18169 status_update_manager.hpp:342] Checkpointing UPDATE for status update TASK_FAILED (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.433837 18168 master.cpp:689] Slave 20140430-220603-1032504131-52571-18018-0 (juno.apache.org) disconnected
      I0430 22:06:03.433846 18168 master.cpp:1263] Disconnecting slave 20140430-220603-1032504131-52571-18018-0
      I0430 22:06:03.433876 18168 hierarchical_allocator_process.hpp:484] Slave 20140430-220603-1032504131-52571-18018-0 disconnected
      I0430 22:06:03.440166 18172 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 12.495641ms
      I0430 22:06:03.440225 18172 replica.cpp:664] Persisted action at 4
      I0430 22:06:03.440526 18172 replica.cpp:643] Replica received learned notice for position 4
      I0430 22:06:03.580061 18172 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 139.507724ms
      I0430 22:06:03.580153 18172 leveldb.cpp:399] Deleting ~2 keys from leveldb took 33527ns
      I0430 22:06:03.580168 18172 replica.cpp:664] Persisted action at 4
      I0430 22:06:03.580178 18172 replica.cpp:649] Replica learned TRUNCATE action at position 4
      I0430 22:06:03.658355 18169 status_update_manager.cpp:373] Forwarding status update TASK_FAILED (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000 to master@67.195.138.61:52571
      I0430 22:06:03.658665 18169 master.cpp:2444] Status update TASK_FAILED (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000 from slave(76)@67.195.138.61:52571
      I0430 22:06:03.658728 18169 master.hpp:576] Removing task f8c08084-068f-4d26-81bd-3a8d8efea7f5 with resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] on slave 20140430-220603-1032504131-52571-18018-0 (juno.apache.org)

      GMOCK WARNING:
      Uninteresting mock function call - returning directly.
      Function call: statusUpdate(0x7fff801a6b50, @0x2af3cc005bc0 72-byte object <D0-30 BB-A9 F3-2A 00-00 00-00 00-00 00-00 00-00 D0-F6 01-CC F3-2A 00-00 B0-59 02-CC F3-2A 00-00 58-0C CB-00 00-00 00-00 C0-A2 01-CC F3-2A 00-00 7A-AA 1B-CC AB-DA D4-41 03-00 00-00 00-00 00-00 37-00 00-00 6E-65 72-69>)
      Stack trace:
      I0430 22:06:03.658898 18169 sched.cpp:620] Scheduler::statusUpdate took 56656ns
      I0430 22:06:03.659023 18169 hierarchical_allocator_process.hpp:637] Recovered cpus:2; mem:1024; disk:1024; ports:[31000-32000] (total allocatable: cpus:2; mem:1024; disk:1024; ports:[31000-32000]) on slave 20140430-220603-1032504131-52571-18018-0 from framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.661164 18168 slave.cpp:140] Slave started on 77)@67.195.138.61:52571
      I0430 22:06:03.661175 18168 credentials.hpp:35] Loading credentials for authentication
      W0430 22:06:03.661217 18168 credentials.hpp:48] Failed to stat credentials file 'file:///tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/credential': No such file or directory
      I0430 22:06:03.661238 18168 slave.cpp:231] Slave using credential for: test-principal
      I0430 22:06:03.661330 18168 slave.cpp:244] Slave resources: cpus:2; mem:1024; disk:1024; ports:[31000-32000]
      I0430 22:06:03.661376 18168 slave.cpp:272] Slave hostname: juno.apache.org
      I0430 22:06:03.661384 18168 slave.cpp:273] Slave checkpoint: true
      I0430 22:06:03.661877 18168 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/meta'
      W0430 22:06:03.662406 18168 state.cpp:426] Failed to find executor forked pid file '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/meta/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000/executors/f8c08084-068f-4d26-81bd-3a8d8efea7f5/runs/4938e23d-ede6-4fdf-811d-aa7bfcc8490b/pids/forked.pid'
      I0430 22:06:03.662811 18168 slave.cpp:3011] Recovering framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.662827 18168 slave.cpp:3246] Recovering executor 'f8c08084-068f-4d26-81bd-3a8d8efea7f5' of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.662894 18168 slave.cpp:3444] Terminating task f8c08084-068f-4d26-81bd-3a8d8efea7f5
      I0430 22:06:03.663360 18168 status_update_manager.cpp:193] Recovering status update manager
      I0430 22:06:03.663368 18168 status_update_manager.cpp:201] Recovering executor 'f8c08084-068f-4d26-81bd-3a8d8efea7f5' of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.663382 18168 status_update_manager.cpp:499] Creating StatusUpdate stream for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:03.663471 18168 status_update_manager.hpp:306] Replaying status update stream for task f8c08084-068f-4d26-81bd-3a8d8efea7f5
      W0430 22:06:03.663485 18168 status_update_manager.cpp:381] Not forwarding status update TASK_FAILED (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000 because no master is elected yet
      I0430 22:06:03.663620 18168 slave.cpp:486] Successfully attached file '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000/executors/f8c08084-068f-4d26-81bd-3a8d8efea7f5/runs/4938e23d-ede6-4fdf-811d-aa7bfcc8490b'
      I0430 22:06:03.663702 18168 slave.cpp:2892] Unable to reconnect to executor 'f8c08084-068f-4d26-81bd-3a8d8efea7f5' of framework 20140430-220603-1032504131-52571-18018-0000 because no libprocess PID was found
      E0430 22:06:03.663727 18168 slave.cpp:2324] Termination of executor 'f8c08084-068f-4d26-81bd-3a8d8efea7f5' of framework '20140430-220603-1032504131-52571-18018-0000' failed: Unknown container: 4938e23d-ede6-4fdf-811d-aa7bfcc8490b
      E0430 22:06:03.663768 18168 slave.cpp:2613] Failed to unmonitor container for executor f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000: Not monitored
      I0430 22:06:04.235726 18168 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 84246ns
      I0430 22:06:05.236497 18168 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 85055ns
      I0430 22:06:05.663892 18168 slave.cpp:1902] Cleaning up un-reregistered executors
      I0430 22:06:05.663956 18168 slave.cpp:2943] Finished recovery
      I0430 22:06:05.664314 18168 slave.cpp:525] New master detected at master@67.195.138.61:52571
      I0430 22:06:05.664341 18168 slave.cpp:585] Authenticating with master master@67.195.138.61:52571
      I0430 22:06:05.664433 18166 status_update_manager.cpp:167] New master detected at master@67.195.138.61:52571
      W0430 22:06:05.664448 18166 status_update_manager.cpp:181] Resending status update TASK_FAILED (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:05.664461 18166 status_update_manager.cpp:373] Forwarding status update TASK_FAILED (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000 to master@67.195.138.61:52571
      W0430 22:06:05.664608 18166 master.cpp:2437] Status update TASK_FAILED (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000 from slave(77)@67.195.138.61:52571 (juno.apache.org): error, couldn't lookup task
      I0430 22:06:05.664674 18166 sched.cpp:620] Scheduler::statusUpdate took 20006ns
      ../../src/tests/slave_recovery_tests.cpp:3161: Failure
      Value of: status.get().state()
      Actual: TASK_FAILED
      Expected: TASK_LOST
      I0430 22:06:05.665145 18168 slave.cpp:558] Detecting new master
      I0430 22:06:05.665238 18168 authenticatee.hpp:128] Creating new client SASL connection
      I0430 22:06:05.665369 18168 status_update_manager.cpp:398] Received status update acknowledgement (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:05.665387 18168 status_update_manager.hpp:342] Checkpointing ACK for status update TASK_FAILED (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:05.665571 18166 master.cpp:2795] Authenticating slave(77)@67.195.138.61:52571
      I0430 22:06:05.665649 18166 authenticator.hpp:148] Creating new server SASL connection
      I0430 22:06:05.665740 18166 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
      I0430 22:06:05.665755 18166 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
      I0430 22:06:05.665777 18166 authenticator.hpp:254] Received SASL authentication start
      I0430 22:06:05.665825 18166 authenticator.hpp:342] Authentication requires more steps
      I0430 22:06:05.665846 18166 authenticatee.hpp:265] Received SASL authentication step
      I0430 22:06:05.665882 18166 authenticator.hpp:282] Received SASL authentication step
      I0430 22:06:05.665899 18166 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'juno.apache.org' server FQDN: 'juno.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
      I0430 22:06:05.665905 18166 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
      I0430 22:06:05.665917 18166 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I0430 22:06:05.665927 18166 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'juno.apache.org' server FQDN: 'juno.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
      I0430 22:06:05.665933 18166 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I0430 22:06:05.665938 18166 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I0430 22:06:05.665949 18166 authenticator.hpp:334] Authentication success
      I0430 22:06:05.665973 18166 authenticatee.hpp:305] Authentication success
      I0430 22:06:05.665993 18166 master.cpp:2835] Successfully authenticated slave(77)@67.195.138.61:52571
      I0430 22:06:05.666055 18166 slave.cpp:642] Successfully authenticated with master master@67.195.138.61:52571
      I0430 22:06:05.666105 18166 slave.cpp:871] Will retry registration in 6.787378006secs if necessary
      W0430 22:06:05.666270 18166 master.cpp:2244] Slave at slave(77)@67.195.138.61:52571 (juno.apache.org) is being allowed to re-register with an already in use id (20140430-220603-1032504131-52571-18018-0)
      I0430 22:06:05.666331 18166 slave.cpp:725] Re-registered with master master@67.195.138.61:52571
      I0430 22:06:05.666352 18166 hierarchical_allocator_process.hpp:498] Slave 20140430-220603-1032504131-52571-18018-0 reconnected
      I0430 22:06:05.924748 18168 status_update_manager.cpp:530] Cleaning up status update stream for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:05.924958 18168 slave.cpp:1538] Status update manager successfully handled status update acknowledgement (UUID: 95b4c752-f44f-483d-9d5b-f96bd3943e6a) for task f8c08084-068f-4d26-81bd-3a8d8efea7f5 of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:05.924981 18168 slave.cpp:3468] Completing task f8c08084-068f-4d26-81bd-3a8d8efea7f5
      I0430 22:06:05.924993 18168 slave.cpp:2479] Cleaning up executor 'f8c08084-068f-4d26-81bd-3a8d8efea7f5' of framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:05.925227 18166 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000/executors/f8c08084-068f-4d26-81bd-3a8d8efea7f5/runs/4938e23d-ede6-4fdf-811d-aa7bfcc8490b' for gc 6.99998929239407days in the future
      I0430 22:06:05.925254 18168 slave.cpp:2554] Cleaning up framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:05.925346 18166 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000/executors/f8c08084-068f-4d26-81bd-3a8d8efea7f5' for gc 6.9999892917837days in the future
      I0430 22:06:05.925370 18168 status_update_manager.cpp:282] Closing status update streams for framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:05.925389 18166 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/meta/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000/executors/f8c08084-068f-4d26-81bd-3a8d8efea7f5/runs/4938e23d-ede6-4fdf-811d-aa7bfcc8490b' for gc 6.99998929146963days in the future
      I0430 22:06:05.925428 18166 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/meta/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000/executors/f8c08084-068f-4d26-81bd-3a8d8efea7f5' for gc 6.99998929119111days in the future
      I0430 22:06:05.925461 18166 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000' for gc 6.99998929039407days in the future
      I0430 22:06:05.925492 18166 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_RestartBeforeContainerizerLaunch_SEYTCL/meta/slaves/20140430-220603-1032504131-52571-18018-0/frameworks/20140430-220603-1032504131-52571-18018-0000' for gc 6.99998929009482days in the future
      I0430 22:06:06.237241 18168 hierarchical_allocator_process.hpp:752] Offering cpus:2; mem:1024; disk:1024; ports:[31000-32000] on slave 20140430-220603-1032504131-52571-18018-0 to framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:06.237427 18168 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 264218ns
      I0430 22:06:06.237565 18166 master.hpp:586] Adding offer 20140430-220603-1032504131-52571-18018-1 with resources cpus:2; mem:1024; disk:1024; ports:[31000-32000] on slave 20140430-220603-1032504131-52571-18018-0 (juno.apache.org)
      I0430 22:06:06.237670 18166 master.cpp:2744] Sending 1 offers to framework 20140430-220603-1032504131-52571-18018-0000
      I0430 22:06:06.237844 18168 sched.cpp:529] Scheduler::resourceOffers took 11363ns
      I0430 22:06:07.238281 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 50695ns
      I0430 22:06:08.234349 18166 master.cpp:104] No whitelist given. Advertising offers for all slaves
      I0430 22:06:08.238426 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 40043ns
      I0430 22:06:09.238927 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 46567ns
      I0430 22:06:10.239409 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 45586ns
      I0430 22:06:11.240574 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 40264ns
      I0430 22:06:12.241654 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 49036ns
      I0430 22:06:13.235519 18166 master.cpp:104] No whitelist given. Advertising offers for all slaves
      I0430 22:06:13.242686 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 39617ns
      I0430 22:06:14.243743 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 41699ns
      I0430 22:06:15.244930 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 42512ns
      I0430 22:06:16.245286 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 41188ns
      I0430 22:06:17.246496 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 45229ns
      I0430 22:06:18.236642 18166 master.cpp:104] No whitelist given. Advertising offers for all slaves
      I0430 22:06:18.246850 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 44617ns
      I0430 22:06:19.247258 18166 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 50257ns
      I0430 22:06:20.248572 18171 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 93953ns
      I0430 22:06:21.249605 18167 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 46852ns
      I0430 22:06:22.250722 18171 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 40012ns
      I0430 22:06:23.236769 18171 master.cpp:104] No whitelist given. Advertising offers for all slaves
      I0430 22:06:23.250846 18171 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 34819ns
      I0430 22:06:24.250983 18171 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 36247ns
      I0430 22:06:25.252133 18171 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 37309ns
      I0430 22:06:26.253300 18171 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 38194ns
      I0430 22:06:27.254431 18171 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 36559ns
      I0430 22:06:28.237540 18171 master.cpp:104] No whitelist given. Advertising offers for all slaves

      Attachments

        Activity

          People

            vinodkone Vinod Kone
            vinodkone Vinod Kone
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: