Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1890

Job Update Pulse History is initialized to no pulses on scheduler recovery

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.18.0
    • None
    • None

    Description

      I have experienced the following problem with pulse updates. To reproduce:
      1. Create an update with a pulse timeout of 1h
      2. Send a pulse to get the update going.
      3. Failover the scheduler immediately after.
      4. Observe that the update is awaiting another pulse right after the failover.

      This is because the JobUpdateControllerImpl stores pulse history and state in memory in PulseHandler. On scheduler startup, the pulse state is reset to no pulse received.

      We can solve this by inferring the timestamp of the last pulse by inspecting the job update events.

      Attachments

        Activity

          People

            zmanji Zameer Manji
            zmanji Zameer Manji
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: