Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9573

Agent should not try to recover operation status update streams that haven't been created yet.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.8.0
    • agent

    Description

      If the agent fails over after having checkpointed a new operation but before the operation status update stream is created, the recovery process will fail.

      This happens because agent will try to recover the operation status update streams even if it hasn't been created yet.

      In order to prevent recovery failures, the agent should obtain the ids of the streams to recover by walking the directory in which operation status updates streams are stored.

      The agent should also garbage collect streams if the checkpointed state doesn't contain a corresponding operation.

      Attachments

        Activity

          People

            gkleiman Gastón Kleiman
            gkleiman Gastón Kleiman
            Greg Mann Greg Mann
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: