Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8237

Strip (Offer|Resource).allocation_info for non-MULTI_ROLE schedulers.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.2, 1.4.2, 1.5.0
    • None
    • Mesosphere Sprint 69
    • 5

    Description

      In support of MULTI_ROLE capable frameworks, a Resource.allocation_info field was added and the Resource math of the Mesos library was updated to check for matching allocation_info when checking for (in)equality, addability, subtractability, containment, etc. To compensate for these changes, the demo frameworks of Mesos were updated to set the allocation_info for Resource objects during the "matching phase" in which offers' resources are evaluated in order for the framework to launch tasks. The Mesos demo frameworks NEEDED to be updated because the Resource algebra within Mesos now depended on matching allocation_info fields of Resource objects when executing algebraic operations. See https://github.com/apache/mesos/commit/c20744a9976b5e83698e9c6062218abb4d2e6b25#diff-298cc6a77862b7ff3422cd06c215ef28R91 .

      This poses a unique problem for *external* libraries that both aim to support various frameworks, some that DO and some that DO NOT opt-in to the MULTI_ROLE capability; specifically those external libraries that implement Resource algebra that's consistent with what Mesos implements internally. One such example of a library is mesos-go, though there are undoubtedly others. The problem can be explained via this scenario:

      Flo's mesos-go framework is running well, it doesn't opt-in to MULTI_ROLE because it doesn't need multiple roles. His framework runs on a version of Mesos that existed prior to integration of MULTI_ROLE support. His DC operator upgrades the mesos cluster to the latest version. Flo rebuilds his framework on the latest version of mesos-go and re-launches it on the cluster. He observes that his framework receives offers, but rejects ALL of them. Digging into the code he realizes that Mesos is injecting allocation_info into Resource objects being offered to his framework, and mesos-go considers allocation_info when comparing Resource objects (because it's MULTI_ROLE compatible now), but his framework doesn't take this into consideration when preparing its own Resource objects prior to the "resource matching phase". The consequence is that Flo's framework is trying to match against Resources that will never align because his framework isn't setting an allocation_info that might possibly match the allocation_info that Mesos is always injecting - regardless of the MULTI_ROLE capability (or lack thereof in this case) of his framework.

      If Mesos were to strip the allocation_info from Resource objects, prior to offering them to non-multi-role frameworks, then the problem illustrated above would go away.

      Attachments

        Activity

          People

            bmahler Benjamin Mahler
            jdef James DeFelice
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: