Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9293

If a framework looses operation information it cannot reconcile to acknowledge updates.

    XMLWordPrintableJSON

Details

    Description

      Normally, frameworks are expected to checkpoint agent ID and resource provider ID before accepting an offer with an OfferOperation. From this expectation comes the requirement in the v1 scheduler API that a framework must provide the agent ID and resource provider ID when acknowledging an offer operation status update. However, this expectation breaks down:

      1. the framework might lose its checkpointed data; it no longer remembers the agent ID or the resource provider ID

      2. even if the framework checkpoints data, it could be sent a stale update: maybe the original ACK it sent to Mesos was lost, and it needs to re-ACK. If a framework deleted its checkpointed data after sending the ACK (that's dropped) then upon replay of the status update it no longer has the agent ID or resource provider ID for the operation.

      An easy remedy would be to add the agent ID and resource provider ID to the OperationStatus message received by the scheduler so that a framework can build a proper ACK for the update, even if it doesn't have access to its previously checkpointed information.

      I'm filing this as a BUG because there's no way to reliably use the offer operation status API until this has been fixed.

      Attachments

        Issue Links

          Activity

            People

              bbannier Benjamin Bannier
              jdef James DeFelice
              Greg Mann Greg Mann
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: