Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-4659

Avoid leaving orphan task after framework failure + master failover

    Details

    • Type: Bug
    • Status: Accepted
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: master

      Description

      If a framework becomes disconnected from the master, its tasks are killed after waiting for failover_timeout.

      However, if a master failover occurs but a framework never reconnects to the new master, we never kill any of the tasks associated with that framework. These tasks remain orphaned and presumably would need to be manually removed by the operator. Similarly, if a framework gets torn down or disconnects while it has running tasks on a partitioned agent, those tasks are not shutdown when the agent reregisters.

      We should consider whether to kill such orphaned tasks automatically, likely after waiting for some (framework-configurable?) timeout.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                neilc Neil Conway
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated: