Details

    • Sprint:
      Mesosphere Sprint 65
    • Story Points:
      1

      Description

      Reported by Tomasz Janiszewski

      Hi

      When task has enabled Mesos healthcheck start time in UI can show wrong
      time. This happens because UI assumes that first status is task started
      [0]. This is not always true because Mesos keeps only recent tasks statuses
      [1] so when healthcheck updates tasks status it can override task start
      time displayed in webui.

      Best
      Tomek

      [0]
      https://github.com/apache/mesos/blob/master/src/webui/master/static/js/controllers.js#L140
      [1]
      https://github.com/apache/mesos/blob/f2adc8a95afda943f6a10e771aad64300da19047/src/common/protobuf_utils.cpp#L263-L265

        Activity

        Hide
        alexr Alexander Rukletsov added a comment -

        Closing this since MESOS-7941 has landed.

        Show
        alexr Alexander Rukletsov added a comment - Closing this since MESOS-7941 has landed.
        Hide
        gkleiman Gastón Kleiman added a comment - - edited

        We discussed this briefly with Vinod Kone and Greg Mann.

        The consensus was that to fully solve this problem we should probably do both things:

        1. Add (and checkpoint) a start_time field.
        2. Make the build-in executors send TASK_STARTING updates.

        Sending the TASK_STARTING update approach is not enough because it might be lost after a master failover.

        We should be careful when updating the built-in executors. Even though the state exists since the dawn of Mesos, some schedulers might not be able to handle them in the right way, e.g., Chronos: https://github.com/mesos/chronos/blob/3758d2c1ab4f517a21bd7d7c0488540cbecc2fc0/src/main/scala/org/apache/mesos/chronos/scheduler/mesos/MesosJobFramework.scala#L199-L208

        Show
        gkleiman Gastón Kleiman added a comment - - edited We discussed this briefly with Vinod Kone and Greg Mann . The consensus was that to fully solve this problem we should probably do both things: 1. Add (and checkpoint) a start_time field. 2. Make the build-in executors send TASK_STARTING updates. Sending the TASK_STARTING update approach is not enough because it might be lost after a master failover. We should be careful when updating the built-in executors. Even though the state exists since the dawn of Mesos, some schedulers might not be able to handle them in the right way, e.g., Chronos: https://github.com/mesos/chronos/blob/3758d2c1ab4f517a21bd7d7c0488540cbecc2fc0/src/main/scala/org/apache/mesos/chronos/scheduler/mesos/MesosJobFramework.scala#L199-L208
        Hide
        janisz Tomasz Janiszewski added a comment -

        This looks like the easiest and cleanest way to solve this problem.

        Show
        janisz Tomasz Janiszewski added a comment - This looks like the easiest and cleanest way to solve this problem.
        Hide
        alexr Alexander Rukletsov added a comment -

        Tomasz Janiszewski yes, by all built-in executors. I don't think we should introduce a capability for this state because it exists since the dawn of Mesos: https://github.com/apache/mesos/blob/91df19465d993c6977cf15a68d75943712d81bed/include/mesos.proto#L111 : ). Most of the scheduler support it already, including Marathon and Aurora.

        Show
        alexr Alexander Rukletsov added a comment - Tomasz Janiszewski yes, by all built-in executors. I don't think we should introduce a capability for this state because it exists since the dawn of Mesos: https://github.com/apache/mesos/blob/91df19465d993c6977cf15a68d75943712d81bed/include/mesos.proto#L111 : ). Most of the scheduler support it already, including Marathon and Aurora.
        Hide
        janisz Tomasz Janiszewski added a comment -

        Alexander Rukletsov TASK_STARTING will be sent by executor? Do we need to add it to the default executor? Will it be backward compatible with frameworks using older libmesos? This could be done analogically to https://issues.apache.org/jira/browse/MESOS-4547

        Show
        janisz Tomasz Janiszewski added a comment - Alexander Rukletsov TASK_STARTING will be sent by executor? Do we need to add it to the default executor? Will it be backward compatible with frameworks using older libmesos? This could be done analogically to https://issues.apache.org/jira/browse/MESOS-4547
        Hide
        alexr Alexander Rukletsov added a comment -

        Alternatively, we can send TASK_STARTING before TASK_RUNNING and use its timestamp in the UI.

        Show
        alexr Alexander Rukletsov added a comment - Alternatively, we can send TASK_STARTING before TASK_RUNNING and use its timestamp in the UI.
        Hide
        janisz Tomasz Janiszewski added a comment -

        Yes, I think this could work. Do we have guarantee that TaskState arrives in order, so there won't be a situation when we set start time and then get update with earlier timestamp.

        Show
        janisz Tomasz Janiszewski added a comment - Yes, I think this could work. Do we have guarantee that TaskState arrives in order, so there won't be a situation when we set start time and then get update with earlier timestamp.
        Hide
        haosdent@gmail.com haosdent added a comment -

        ping Tomasz Janiszewski Do you think is this way acceptable?

        Show
        haosdent@gmail.com haosdent added a comment - ping Tomasz Janiszewski Do you think is this way acceptable?
        Hide
        haosdent@gmail.com haosdent added a comment -

        The current behaviour is to set the start time once receive first TaskState https://github.com/apache/mesos/blob/master/src/webui/master/static/js/controllers.js#L140-L160 , should we follow this?

        Show
        haosdent@gmail.com haosdent added a comment - The current behaviour is to set the start time once receive first TaskState https://github.com/apache/mesos/blob/master/src/webui/master/static/js/controllers.js#L140-L160 , should we follow this?
        Hide
        janisz Tomasz Janiszewski added a comment -

        I created first commit https://github.com/janisz/mesos/commit/0256557e89360bffe902b56ebf7e478cc34e8876#diff-28ebad5255c4c1a70b4abf35651198fdR7825 but I have no idea when start time should be updated? I changed updateStatus and if status is != STAGING and task already has not start time or update is earlier then start time gets updated with status timestamp. Is this good way or maybe it should be handle differently.

        Show
        janisz Tomasz Janiszewski added a comment - I created first commit https://github.com/janisz/mesos/commit/0256557e89360bffe902b56ebf7e478cc34e8876#diff-28ebad5255c4c1a70b4abf35651198fdR7825 but I have no idea when start time should be updated? I changed updateStatus and if status is != STAGING and task already has not start time or update is earlier then start time gets updated with status timestamp. Is this good way or maybe it should be handle differently.
        Hide
        haosdent@gmail.com haosdent added a comment -

        Cool!

        Show
        haosdent@gmail.com haosdent added a comment - Cool!
        Hide
        janisz Tomasz Janiszewski added a comment -

        Agree, definitely less hacky then assumption first status is start time. I will prepare separated patches for backend and UI.

        Show
        janisz Tomasz Janiszewski added a comment - Agree, definitely less hacky then assumption first status is start time. I will prepare separated patches for backend and UI.
        Hide
        alexr Alexander Rukletsov added a comment -

        I didn't say so, but if you mean extending Task message with start_time, I think it makes sense.

        Show
        alexr Alexander Rukletsov added a comment - I didn't say so, but if you mean extending Task message with start_time , I think it makes sense.
        Hide
        haosdent@gmail.com haosdent added a comment -

        As Alexander Rukletsov said, how about add an optional field start_time in message Task and set it in updateTask?

        Show
        haosdent@gmail.com haosdent added a comment - As Alexander Rukletsov said, how about add an optional field start_time in message Task and set it in updateTask ?
        Hide
        janisz Tomasz Janiszewski added a comment -

        Exactly. If first status in array will be status when task is running UI will work.

        Show
        janisz Tomasz Janiszewski added a comment - Exactly. If first status in array will be status when task is running UI will work.
        Hide
        alexr Alexander Rukletsov added a comment -

        If I understand correctly what you say, there is no way to get task start time from /state endpoint if health check is enabled. If this is true, we should fix this first, then the UI fix will be straightforward.

        Show
        alexr Alexander Rukletsov added a comment - If I understand correctly what you say, there is no way to get task start time from /state endpoint if health check is enabled. If this is true, we should fix this first, then the UI fix will be straightforward.
        Hide
        janisz Tomasz Janiszewski added a comment - - edited

        I think the right solution is to change condition in void Master::updateTask(Task* task, const StatusUpdate& update) that checks if new status has same state to check if new status has same state, reason and health and only if all this fields are the same drop previous message.

        Show
        janisz Tomasz Janiszewski added a comment - - edited I think the right solution is to change condition in void Master::updateTask(Task* task, const StatusUpdate& update) that checks if new status has same state to check if new status has same state, reason and health and only if all this fields are the same drop previous message.

          People

          • Assignee:
            bennoe Benno Evers
            Reporter:
            haosdent@gmail.com haosdent
            Shepherd:
            Alexander Rukletsov
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development

                Agile