Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7744

Mesos Agent Sends TASK_KILL status update to Master, and still launches task

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.0.1
    • 1.1.3, 1.2.3, 1.3.2, 1.4.0
    • None

    Description

      We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a TASK_STARTING back from the agent. Under certain conditions it can result in Mesos losing track of the task. The chunk of the logs which is interesting is here:

      Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned task Titus-7590548-worker-0-4476 for framework TitusFramework
      Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task Titus-7590548-worker-0-4476 for framework TitusFramework
      Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework TitusFramework at executor(1)@100.66.11.10:17707
      Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill task Titus-7590548-worker-0-4476 of framework TitusFramework
      Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
      Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework TitusFramework at executor(1)@100.66.11.10:17707{
      

      In our executor, we see that the launch message arrives after the master has already gotten the kill update. We then send non-terminal state updates to the agent, and yet it doesn't forward these to our framework. We're using a custom executor which is based on the older mesos-go bindings.

      Attachments

        Issue Links

          Activity

            People

              bmahler Benjamin Mahler
              sargun Sargun Dhillon
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: