Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-4659

Avoid leaving orphan task after framework failure + master failover

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Accepted
    • Major
    • Resolution: Unresolved
    • None
    • None
    • master

    Description

      If a framework becomes disconnected from the master, its tasks are killed after waiting for failover_timeout.

      However, if a master failover occurs but a framework never reconnects to the new master, we never kill any of the tasks associated with that framework. These tasks remain orphaned and presumably would need to be manually removed by the operator. Similarly, if a framework gets torn down or disconnects while it has running tasks on a partitioned agent, those tasks are not shutdown when the agent reregisters.

      We should consider whether to kill such orphaned tasks automatically, likely after waiting for some (framework-configurable?) timeout.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              neilc Neil Conway
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: