Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-822

AllocatorTest/0.SchedulerFailover is flaky

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.16.0
    • Component/s: None
    • Labels:
      None

      Description

      slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
      tests/allocator_tests.cpp:993: Failure
      Mock function called more times than expected - taking default action specified at:
      ./tests/mesos.hpp:412:
      Function call: resourcesUnused(@0x7f6f80025e58 201311191201-16777343-52448-17056-0000, @0x7f6f80025e38 201311191201-16777343-52448-17056-0, @0x7f6f80025e00

      { cpus(*):2, mem(*):768, disk(*):22668, ports(*):[31000-32000] }

      , @0x7f6f80025df0 16-byte object <00-00 00-00 00-00 00-00 30-10 03-80 6F-7F 00-00>)
      Expected: to be called once
      Actual: called twice - over-saturated and active

      Full Log:

      [ RUN ] AllocatorTest/0.SchedulerFailover
      I1119 12:01:32.106143 19009 exec.cpp:84] Committing suicide by killing the process group
      I1119 12:01:32.106276 19017 exec.cpp:84] Committing suicide by killing the process group
      I1119 12:01:32.108185 18999 exec.cpp:84] Committing suicide by killing the process group
      I1119 12:01:32.113991 17076 master.cpp:285] Master started on 127.0.0.1:52448
      I1119 12:01:32.114038 17076 master.cpp:299] Master ID: 201311191201-16777343-52448-17056
      I1119 12:01:32.114047 17076 master.cpp:302] Master only allowing authenticated frameworks to register!
      I1119 12:01:32.114109 17082 slave.cpp:112] Slave started on 127)@127.0.0.1:52448
      I1119 12:01:32.114209 17082 slave.cpp:212] Slave resources: cpus:3; mem:1024; disk:22668; ports:[31000-32000]
      I1119 12:01:32.114393 17080 sched.cpp:207] New master detected at master@127.0.0.1:52448
      I1119 12:01:32.114413 17080 sched.cpp:260] Authenticating with master master@127.0.0.1:52448
      I1119 12:01:32.114461 17080 sched.cpp:229] Detecting new master
      I1119 12:01:32.114497 17080 authenticatee.hpp:124] Creating new client SASL connection
      I1119 12:01:32.118248 17082 state.cpp:33] Recovering state from '/tmp/AllocatorTest_0_SchedulerFailover_LsrJz0/meta'
      I1119 12:01:32.118343 17082 status_update_manager.cpp:180] Recovering status update manager
      I1119 12:01:32.118407 17082 slave.cpp:2743] Finished recovery
      I1119 12:01:32.118463 17082 slave.cpp:497] New master detected at master@127.0.0.1:52448
      I1119 12:01:32.118517 17082 slave.cpp:524] Detecting new master
      I1119 12:01:32.118538 17082 status_update_manager.cpp:158] New master detected at master@127.0.0.1:52448
      I1119 12:01:32.118906 17076 master.cpp:1734] Authenticating framework at scheduler(119)@127.0.0.1:52448
      W1119 12:01:32.118986 17076 master.cpp:1235] Ignoring register slave message from localhost.localdomain since not elected yet
      I1119 12:01:32.119091 17076 master.cpp:85] No whitelist given. Advertising offers for all slaves
      I1119 12:01:32.119155 17076 authenticator.hpp:140] Creating new server SASL connection
      I1119 12:01:32.119243 17076 hierarchical_allocator_process.hpp:302] Initializing hierarchical allocator process with master : master@127.0.0.1:52448
      I1119 12:01:32.119279 17076 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5
      I1119 12:01:32.119293 17076 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5'
      I1119 12:01:32.119312 17076 master.cpp:744] The newly elected leader is master@127.0.0.1:52448
      I1119 12:01:32.119321 17076 master.cpp:748] Elected as the leading master!
      I1119 12:01:32.119343 17076 authenticator.hpp:243] Received SASL authentication start
      I1119 12:01:32.119390 17076 authenticator.hpp:325] Authentication requires more steps
      I1119 12:01:32.119417 17076 authenticatee.hpp:258] Received SASL authentication step
      I1119 12:01:32.119447 17076 authenticator.hpp:271] Received SASL authentication step
      I1119 12:01:32.119463 17076 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
      I1119 12:01:32.119472 17076 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
      I1119 12:01:32.119482 17076 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I1119 12:01:32.119490 17076 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
      I1119 12:01:32.119498 17076 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I1119 12:01:32.119503 17076 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I1119 12:01:32.119514 17076 authenticator.hpp:317] Authentication success
      I1119 12:01:32.119532 17076 authenticatee.hpp:298] Authentication success
      I1119 12:01:32.119547 17076 master.cpp:1774] Successfully authenticated framework at scheduler(119)@127.0.0.1:52448
      I1119 12:01:32.119604 17076 sched.cpp:334] Successfully authenticated with master master@127.0.0.1:52448
      I1119 12:01:32.119642 17076 master.cpp:798] Received registration request from scheduler(119)@127.0.0.1:52448
      I1119 12:01:32.119693 17076 master.cpp:816] Registering framework 201311191201-16777343-52448-17056-0000 at scheduler(119)@127.0.0.1:52448
      I1119 12:01:32.119742 17076 sched.cpp:373] Framework registered with 201311191201-16777343-52448-17056-0000
      I1119 12:01:32.119766 17076 sched.cpp:387] Scheduler::registered took 10828ns
      I1119 12:01:32.119802 17076 hierarchical_allocator_process.hpp:332] Added framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:32.119812 17076 hierarchical_allocator_process.hpp:726] No resources available to allocate!
      I1119 12:01:32.119818 17076 hierarchical_allocator_process.hpp:688] Performed allocation for 0 slaves in 7043ns
      2013-11-19 12:01:33,112:17056(0x7f6f7b5fe700):ZOO_ERROR@handle_socket_error_msg@1579: Socket [127.0.0.1:50700] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
      I1119 12:01:33.119645 17077 master.cpp:1266] Attempting to register slave on localhost.localdomain at slave(127)@127.0.0.1:52448
      I1119 12:01:33.119696 17077 master.cpp:2513] Adding slave 201311191201-16777343-52448-17056-0 at localhost.localdomain with cpus:3; mem:1024; disk:22668; ports:[31000-32000]
      I1119 12:01:33.119916 17077 slave.cpp:542] Registered with master master@127.0.0.1:52448; given slave ID 201311191201-16777343-52448-17056-0
      I1119 12:01:33.120085 17077 hierarchical_allocator_process.hpp:445] Added slave 201311191201-16777343-52448-17056-0 (localhost.localdomain) with cpus:3; mem:1024; disk:22668; ports:[31000-32000] (and cpus:3; mem:1024; disk:22668; ports:[31000-32000] available)
      I1119 12:01:33.120193 17077 hierarchical_allocator_process.hpp:752] Offering cpus:3; mem:1024; disk:22668; ports:[31000-32000] on slave 201311191201-16777343-52448-17056-0 to framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.120355 17077 hierarchical_allocator_process.hpp:708] Performed allocation for slave 201311191201-16777343-52448-17056-0 in 202267ns
      I1119 12:01:33.120445 17077 master.hpp:428] Adding offer 201311191201-16777343-52448-17056-0 with resources cpus:3; mem:1024; disk:22668; ports:[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
      I1119 12:01:33.120522 17077 master.cpp:1700] Sending 1 offers to framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.121275 17077 sched.cpp:480] Scheduler::resourceOffers took 627781ns
      I1119 12:01:33.121891 17078 master.cpp:2026] Processing reply for offer 201311191201-16777343-52448-17056-0 on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain) for framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.122030 17078 master.hpp:400] Adding task 0 with resources cpus:1; mem:256 on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
      I1119 12:01:33.122130 17078 master.cpp:2150] Launching task 0 of framework 201311191201-16777343-52448-17056-0000 with resources cpus:1; mem:256 on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
      I1119 12:01:33.122335 17078 master.hpp:438] Removing offer 201311191201-16777343-52448-17056-0 with resources cpus:3; mem:1024; disk:22668; ports:[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
      I1119 12:01:33.122450 17078 slave.cpp:722] Got assigned task 0 for framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.122722 17078 slave.cpp:833] Launching task 0 for framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.122972 17082 hierarchical_allocator_process.hpp:547] Framework 201311191201-16777343-52448-17056-0000 left cpus:2; mem:768; disk:22668; ports:[31000-32000] unused on slave 201311191201-16777343-52448-17056-0
      I1119 12:01:33.125586 17083 hierarchical_allocator_process.hpp:752] Offering cpus:2; mem:768; disk:22668; ports:[31000-32000] on slave 201311191201-16777343-52448-17056-0 to framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.125779 17083 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 263589ns
      I1119 12:01:33.125910 17083 master.hpp:428] Adding offer 201311191201-16777343-52448-17056-1 with resources cpus:2; mem:768; disk:22668; ports:[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
      I1119 12:01:33.125987 17083 master.cpp:1700] Sending 1 offers to framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.126130 17083 sched.cpp:480] Scheduler::resourceOffers took 41584ns
      I1119 12:01:33.126202 17083 master.cpp:2026] Processing reply for offer 201311191201-16777343-52448-17056-1 on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain) for framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.126309 17083 master.hpp:438] Removing offer 201311191201-16777343-52448-17056-1 with resources cpus:2; mem:768; disk:22668; ports:[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
      tests/allocator_tests.cpp:993: Failure
      Mock function called more times than expected - taking default action specified at:
      ./tests/mesos.hpp:412:
      Function call: resourcesUnused(@0x7f6f80025e58 201311191201-16777343-52448-17056-0000, @0x7f6f80025e38 201311191201-16777343-52448-17056-0, @0x7f6f80025e00

      { cpus(*):2, mem(*):768, disk(*):22668, ports(*):[31000-32000] }

      , @0x7f6f80025df0 16-byte object <00-00 00-00 00-00 00-00 30-10 03-80 6F-7F 00-00>)
      Expected: to be called once
      Actual: called twice - over-saturated and active
      I1119 12:01:33.126698 17083 hierarchical_allocator_process.hpp:547] Framework 201311191201-16777343-52448-17056-0000 left cpus:2; mem:768; disk:22668; ports:[31000-32000] unused on slave 201311191201-16777343-52448-17056-0
      I1119 12:01:33.126823 17083 hierarchical_allocator_process.hpp:590] Framework 201311191201-16777343-52448-17056-0000 filtered slave 201311191201-16777343-52448-17056-0 for 5secs
      I1119 12:01:33.131556 17077 exec.cpp:178] Executor started at: executor(38)@127.0.0.1:52448 with pid 17056
      I1119 12:01:33.132163 17078 slave.cpp:943] Queuing task '0' for executor default of framework '201311191201-16777343-52448-17056-0000
      I1119 12:01:33.132760 17078 slave.cpp:466] Successfully attached file '/tmp/AllocatorTest_0_SchedulerFailover_LsrJz0/slaves/201311191201-16777343-52448-17056-0/frameworks/201311191201-16777343-52448-17056-0000/executors/default/runs/9e17d16e-9bcc-4c74-a963-bd3aebe198de'
      I1119 12:01:33.133375 17078 slave.cpp:1406] Got registration for executor 'default' of framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.133811 17076 exec.cpp:202] Executor registered on slave 201311191201-16777343-52448-17056-0
      I1119 12:01:33.134820 17076 exec.cpp:214] Executor::registered took 16123ns
      I1119 12:01:33.135026 17078 slave.cpp:1527] Flushing queued task 0 for executor 'default' of framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.135469 17082 exec.cpp:289] Executor asked to run task '0'
      I1119 12:01:33.135507 17082 exec.cpp:298] Executor::launchTask took 20880ns
      I1119 12:01:33.135705 17077 sched.cpp:638] Stopping framework '201311191201-16777343-52448-17056-0000'
      I1119 12:01:33.135766 17077 master.cpp:562] Framework 201311191201-16777343-52448-17056-0000 disconnected
      I1119 12:01:33.135777 17077 master.cpp:1032] Deactivating framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.135797 17077 master.cpp:584] Giving framework 201311191201-16777343-52448-17056-0000 100ms to failover
      I1119 12:01:33.135895 17077 hierarchical_allocator_process.hpp:408] Deactivated framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.136298 17077 sched.cpp:207] New master detected at master@127.0.0.1:52448
      I1119 12:01:33.136318 17077 sched.cpp:260] Authenticating with master master@127.0.0.1:52448
      I1119 12:01:33.136359 17077 sched.cpp:229] Detecting new master
      I1119 12:01:33.136404 17077 authenticatee.hpp:124] Creating new client SASL connection
      I1119 12:01:33.136474 17077 master.cpp:1734] Authenticating framework at scheduler(120)@127.0.0.1:52448
      I1119 12:01:33.136549 17077 authenticator.hpp:140] Creating new server SASL connection
      I1119 12:01:33.136607 17077 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5
      I1119 12:01:33.136622 17077 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5'
      I1119 12:01:33.136646 17077 authenticator.hpp:243] Received SASL authentication start
      I1119 12:01:33.136682 17077 authenticator.hpp:325] Authentication requires more steps
      I1119 12:01:33.136708 17077 authenticatee.hpp:258] Received SASL authentication step
      I1119 12:01:33.136744 17077 authenticator.hpp:271] Received SASL authentication step
      I1119 12:01:33.136761 17077 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
      I1119 12:01:33.136770 17077 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
      I1119 12:01:33.136781 17077 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
      I1119 12:01:33.136790 17077 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
      I1119 12:01:33.136796 17077 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
      I1119 12:01:33.136802 17077 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
      I1119 12:01:33.136814 17077 authenticator.hpp:317] Authentication success
      I1119 12:01:33.136868 17077 authenticatee.hpp:298] Authentication success
      I1119 12:01:33.136894 17077 master.cpp:1774] Successfully authenticated framework at scheduler(120)@127.0.0.1:52448
      I1119 12:01:33.136978 17077 sched.cpp:334] Successfully authenticated with master master@127.0.0.1:52448
      I1119 12:01:33.137032 17077 master.cpp:886] Re-registering framework 201311191201-16777343-52448-17056-0000 at scheduler(120)@127.0.0.1:52448
      I1119 12:01:33.137045 17077 master.cpp:912] Framework 201311191201-16777343-52448-17056-0000 failed over
      I1119 12:01:33.137140 17077 sched.cpp:373] Framework registered with 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.137161 17077 sched.cpp:387] Scheduler::registered took 8413ns
      I1119 12:01:33.137184 17077 hierarchical_allocator_process.hpp:378] Activated framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.137224 17077 hierarchical_allocator_process.hpp:752] Offering cpus:2; mem:768; disk:22668; ports:[31000-32000] on slave 201311191201-16777343-52448-17056-0 to framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.137315 17077 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 118282ns
      I1119 12:01:33.137363 17077 master.hpp:428] Adding offer 201311191201-16777343-52448-17056-2 with resources cpus:2; mem:768; disk:22668; ports:[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
      I1119 12:01:33.137404 17077 master.cpp:1700] Sending 1 offers to framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.137485 17077 sched.cpp:480] Scheduler::resourceOffers took 12196ns
      I1119 12:01:33.137682 17056 master.cpp:554] Master terminating
      I1119 12:01:33.137706 17056 master.cpp:210] Shutting down master
      I1119 12:01:33.137748 17056 master.hpp:418] Removing task 0 with resources cpus:1; mem:256 on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
      I1119 12:01:33.137806 17056 master.hpp:438] Removing offer 201311191201-16777343-52448-17056-2 with resources cpus:2; mem:768; disk:22668; ports:[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
      I1119 12:01:33.137882 17056 master.cpp:253] Removing slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
      I1119 12:01:33.138190 17081 sched.cpp:638] Stopping framework '201311191201-16777343-52448-17056-0000'
      I1119 12:01:33.138613 17078 slave.cpp:391] Slave terminating
      I1119 12:01:33.138898 17078 slave.cpp:1122] Asked to shut down framework 201311191201-16777343-52448-17056-0000 by @0.0.0.0:0
      I1119 12:01:33.139130 17078 slave.cpp:1147] Shutting down framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.139390 17078 slave.cpp:2406] Shutting down executor 'default' of framework 201311191201-16777343-52448-17056-0000
      I1119 12:01:33.139701 17080 exec.cpp:375] Executor asked to shutdown
      I1119 12:01:33.139726 17080 exec.cpp:390] Executor::shutdown took 8678ns
      [ FAILED ] AllocatorTest/0.SchedulerFailover, where TypeParam = mesos::internal::master::allocator::HierarchicalAllocatorProcess<mesos::internal::master::allocator::DRFSorter, mesos::internal::master::allocator::DRFSorter> (1041 ms)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bmahler Benjamin Mahler
                Reporter:
                xujyan Yan Xu
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: