Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8459

Executor could linger without ever receiving any tasks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • executor
    • None

    Description

      An executor's initial tasks may be killed even after it has been registered. In that case, the executor could linger forever.

      In MESOS-8411, we have a short-term fix that checks an executor's completed and terminated task queues to see if it had ever received any tasks. if the check is false and there is no queued or launched tasks, agent will shutdown the executor. 

      However, this check is not bullet-proof. The completedTasks queue is a circular_buffer (current size 200) which means earlier completed tasks that are possibly updated by the executor may be ejected and thus are missed by this check. This would lead to false positive shutdowns.

      Per discussion with vinodkone and bmahler. There are two long term solutions.

      The first one is to checkpoint additional executor states which indicates whether the executor has ever received any tasks (no more inference from task queue status);

      The alternative is to add timeouts in the executor driver to shutdown lingering executors automatically.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mzhu Meng Zhu
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated: