Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8350

Resource provider-capable agents not correctly synchronizing checkpointed agent resources on reregistration

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.5.0, 1.6.0
    • master
    • None

    Description

      For resource provider-capable agents the master does not re-send checkpointed resources on agent reregistration; instead the checkpointed resources sent as part of the ReregisterSlaveMessage should be used.

      This is not what happens in reality. If e.g., checkpointing of an offer operation fails and the agent fails over the checkpointed resources would, as expected, not be reflected in the agent, but would still be assumed in the master.

      A workaround is to fail over the master which would lead to the newly elected master bootstrapping agent state from ReregisterSlaveMessage.

      Attachments

        Activity

          People

            bbannier Benjamin Bannier
            bbannier Benjamin Bannier
            Jie Yu Jie Yu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: