Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8058

Agent and master can race when updating agent state.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.5.0
    • 1.5.0
    • agent

    Description

      In 2af9a5b07dc80151154264e974d03f56a1c25838 we introduce the use of UpdateSlaveMessage for the agent to inform the master about its current total resources. Currently we trigger this message only on agent registration and reregistration.

      This can race with operations applied in the master and communicated via CheckpointResourcesMessage.

      Example:

      1. Agent (cpus:4(*) registers.
      2. Master is triggered to apply an operation to the agent's resources, e.g., a reservation: cpus:4(*) -> cpus:4(A). The master applies the operation to its current view of the agent's resources and sends the agent a CheckpointResourcesMessage so the agent can persist the result.
      3. The agent sends the master an UpdateSlaveMessage, e.g., cpus:4(*) since it hasn't received the CheckpointResourcesMessage yet.
      4. The master processes the UpdateSlaveMessage and updates its view of the agent's resources to be cpus:4(*).
      5. The agent processes the CheckpointResourcesMessage and updates its view of its resources to be cpus:4(A).
      6. The agent and the master have an inconsistent view of the agent's resources.

      Attachments

        Issue Links

          Activity

            People

              bbannier Benjamin Bannier
              bbannier Benjamin Bannier
              Alex R Alex R
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: