Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-6249

On Mesos master failover the reregistered callback is not triggered

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.28.0, 0.28.1, 1.0.1
    • None
    • java api
    • None
    • OS X 10.11.6

    Description

      On a Mesos master failover the reregistered callback of the Java API is not triggered. Only the registration callback is triggered which makes it hard for a framework to distinguish between these scenarios.

      This behaviour has been tested with the ConductR framework, both with the Java API version 0.28.0, 0.28.1 and 1.0.1. Below you find the logs from the master that got re-elected and from the ConductR framework.

      Log: Mesos master on a master re-election

      I0926 11:44:20.008306 3747840 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
      I0926 11:44:20.008458 3747840 master.cpp:1847] The newly elected leader is master@127.0.0.1:5050 with id ca5b9713-1eec-43e1-9d27-9ebc5c0f95b1
      I0926 11:44:20.008484 3747840 master.cpp:1860] Elected as the leading master!
      I0926 11:44:20.008498 3747840 master.cpp:1547] Recovering from registrar
      I0926 11:44:20.008607 3747840 registrar.cpp:332] Recovering registrar
      I0926 11:44:20.016340 4284416 registrar.cpp:365] Successfully fetched the registry (0B) in 7.702016ms
      I0926 11:44:20.016393 4284416 registrar.cpp:464] Applied 1 operations in 12us; attempting to update the 'registry'
      I0926 11:44:20.021428 4284416 registrar.cpp:509] Successfully updated the 'registry' in 5.019904ms
      I0926 11:44:20.021481 4284416 registrar.cpp:395] Successfully recovered registrar
      I0926 11:44:20.021611 528384 master.cpp:1655] Recovered 0 agents from the Registry (118B) ; allowing 10mins for agents to re-register
      I0926 11:44:20.536859 3747840 master.cpp:2424] Received SUBSCRIBE call for framework 'conductr' at scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
      I0926 11:44:20.536969 3747840 master.cpp:2500] Subscribing framework conductr with checkpointing disabled and capabilities [  ]
      I0926 11:44:20.537401 3211264 hierarchical.cpp:271] Added framework conductr
      I0926 11:44:20.807895 528384 master.cpp:4787] Re-registering agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 (127.0.0.1)
      I0926 11:44:20.808145 1601536 registrar.cpp:464] Applied 1 operations in 38us; attempting to update the 'registry'
      I0926 11:44:20.815757 1601536 registrar.cpp:509] Successfully updated the 'registry' in 7.568896ms
      I0926 11:44:20.815992 3747840 master.cpp:7447] Adding task 6abce9bb-895f-4f6f-be5b-25f6bd09f548 with resources mem(*):0 on agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1)
      I0926 11:44:20.816339 3747840 master.cpp:4872] Re-registered agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 (127.0.0.1) with cpus(*):8; mem(*):15360; disk(*):470832; ports(*):[31000-32000]
      I0926 11:44:20.816385 1601536 hierarchical.cpp:478] Added agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1) with cpus(*):8; mem(*):15360; disk(*):470832; ports(*):[31000-32000] (allocated: cpus(*):0.9; mem(*):402.653; disk(*):1000; ports(*):[31000-31000, 31001-31500])
      I0926 11:44:20.816437 3747840 master.cpp:4940] Sending updated checkpointed resources  to agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 (127.0.0.1)
      I0926 11:44:20.816787 4284416 master.cpp:5725] Sending 1 offers to framework conductr (conductr) at scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
      

      Log: ConductR framework

      I0926 11:44:20.007189 66441216 detector.cpp:152] Detected a new leader: (id='87')
      I0926 11:44:20.007524 64294912 group.cpp:706] Trying to get '/mesos/json.info_0000000087' in ZooKeeper
      I0926 11:44:20.008625 63758336 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050) is detected
      I0926 11:44:20.008965 63758336 sched.cpp:330] New master detected at master@127.0.0.1:5050
      2016-09-26T09:44:20Z MacBook-Pro-6.local INFO  MesosSchedulerClient [sourceThread=conductr-akka.actor.default-dispatcher-2, akkaTimestamp=09:44:20.009UTC, akkaSource=akka.tcp://conductr@127.0.0.1:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client, sourceActorSystem=conductr] - Mesos master has been disconnected..
      I0926 11:44:20.012472 63758336 sched.cpp:341] No credentials provided. Attempting to register without authentication
      I0926 11:44:20.537613 65904640 sched.cpp:743] Framework registered with conductr
      2016-09-26T09:44:20Z MacBook-Pro-6.local INFO  MesosSchedulerClient [sourceThread=conductr-akka.actor.default-dispatcher-18, akkaTimestamp=09:44:20.538UTC, akkaSource=akka.tcp://conductr@127.0.0.1:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client, sourceActorSystem=conductr] - Mesos master on localhost:5050 has been registered with ConductR framework id: conductr
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              markusjura Markus Jura
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: