Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-4712

Remove 'force' field from the Subscribe Call in v1 Scheduler API

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.28.0
    • None
    • None

    Description

      We/I introduced the `force` field in SUBSCRIBE call to deal with scheduler partition cases. Having thought a bit more and discussing with few other folks (anandmazumdar, greggomann), I think we can get away from not having that field in the v1 API. The obvious advantage of removing the field is that framework devs don't have to think about how/when to set the field (the current semantics are a bit confusing).

      The new workflow when a master receives a SUBSCRIBE call is that master always accepts this call and closes any existing connection (after sending ERROR event) from the same scheduler (identified by framework id).

      The expectation from schedulers is that they must close the old subscribe connection before resending a new SUBSCRIBE call.

      Lets look at some tricky scenarios and see how this works and why it is safe.

      1) Connection disconnection @ the scheduler but not @ the master

      Scheduler sees the disconnection and sends a new SUBSCRIBE call. Master sends ERROR on the old connection (won't be received by the scheduler because the connection is already closed) and closes it.

      2) Connection disconnection @ master but not @ scheduler

      Scheduler realizes this from lack of HEARTBEAT events. It then closes its existing connection and sends a new SUBSCRIBE call. Master accepts the new SUBSCRIBE call. There is no old connection to close on the master as it is already closed.

      3) Scheduler failover but no disconnection @ master

      Newly elected scheduler sends a SUBSCRIBE call. Master sends ERROR event and closes the old connection (won't be received because the old scheduler failed over).

      4) If Scheduler A got partitioned (but is alive and connected with master) and Scheduler B got elected as new leader.

      When Scheduler B sends SUBSCRIBE, master sends ERROR and closes the connection from Scheduler A. Master accepts Scheduler B's connection. Typically Scheduler A aborts after receiving ERROR and gets restarted. After restart it won't become the leader because Scheduler B is already elected.

      5) Scheduler sends SUBSCRIBE, times out, closes the SUBSCRIBE connection (A) and sends a new SUBSCRIBE (B). Master receives SUBSCRIBE (B) and then receives SUBSCRIBE (A) but doesn't see A's disconnection yet.

      Master first accepts SUBSCRIBE (B). After it receives SUBSCRIBE (A), it sends ERROR to SUBSCRIBE (B) and closes that connection. When it accepts SUBSCRIBE (A) and tries to send SUBSCRIBED event the connection closure is detected. Scheduler retries the SUBSCRIBE connection after a backoff. I think this is a rare enough race for it to happen continuously in a loop.

      Attachments

        Activity

          People

            vinodkone Vinod Kone
            vinodkone Vinod Kone
            Vinod Kone Vinod Kone
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: