Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-10018

Duplicate tasks if agent partitioned during maintenance down

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.3, 1.8.2, 1.9.1, 1.10.0
    • Component/s: None
    • Labels:
    • Target Version/s:
    • Sprint:
      Foundations: RI-19 57, Foundations: RI-20 58
    • Story Points:
      5

      Description

      When the master starts maintenance for a node it

      (1) sends a ShutdownMessage message to agent, and
      (2) removes the slave which transitions all tasks to TASK_LOST and moves them
      to the completed task set.

      If the ShutdownMessage isn't fully processed on the agent (e.g., message dropped between (1) and (2), or agent process killed before the executor has shut down), the agent could come back with the lost task running. It would report the task on registration with the master, which would add it to the list of active tasks. With that the same task could be both completed and active.

        Attachments

          Activity

            People

            • Assignee:
              bbannier Benjamin Bannier
              Reporter:
              bbannier Benjamin Bannier
              Shepherd:
              Greg Mann
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: