Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5181

Master should reject calls from the scheduler driver if the scheduler is not connected.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.24.0
    • 1.0.0
    • scheduler driver
    • Mesosphere Sprint 34
    • 1

    Description

      When a scheduler registers, the master will create a link from master to scheduler. If this link breaks, the master will consider the scheduler inactive and mark it as disconnected.

      This causes a couple problems:
      1) Master does not send offers to inactive schedulers. But these schedulers might consider themselves "registered" in a one-way network partition scenario.
      2) Any calls from the inactive scheduler is still accepted, which leaves the scheduler in a starved, but semi-functional state.

      See the related issue for more context: MESOS-5180

      There should be an additional guard for registered, but inactive schedulers here:
      https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/master.cpp#L1977

      The HTTP API already does this:
      https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/http.cpp#L459

      Since the scheduler driver cannot return a 403, it may be necessary to return a Event::ERROR and force the scheduler to abort.

      Attachments

        Issue Links

          Activity

            People

              anandmazumdar Anand Mazumdar
              kaysoky Joseph Wu
              Vinod Kone Vinod Kone
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: