Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9541

Transition agent operations to some "lost" state when the agent is removed.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1, 1.7.0, 1.7.1
    • None
    • None

    Description

      MESOS-8782 and MESOS-8783 transition operations to OPERATION_GONE_BY_OPERATOR or OPERATION_UNREACHABLE when their agents are marked as gone or unreachable respectively. However, there are other cases where agents can be "removed" and forgot by the master:
      1) When an agent tries to register with a new ID from the same IP:
      https://github.com/apache/mesos/blob/f130544bdb8a9849096ee2cb35ebcbc7d8a326d8/src/master/master.cpp#L6836-L6849
      2) When an agent requests to unregister:
      https://github.com/apache/mesos/blob/f130544bdb8a9849096ee2cb35ebcbc7d8a326d8/src/master/master.cpp#L7817-L7840

      In these tasks, the master explicitly sends TASK_LOST for task status updates (this also means that this documentation is wrong), but does nothing for operations. We should design proper operation status transitions for these cases.

      Attachments

        Issue Links

          Activity

            People

              greggomann Greg Mann
              chhsia0 Chun-Hung Hsiao
              Gastón Kleiman Gastón Kleiman
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: