Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1023

Releasing the update lock trips off scheduler updater

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 0.8.0
    • Scheduler
    • None
    • Twitter Aurora Q1'15 Sprint 1, Twitter Aurora Q1'15 Sprint 2
    • 5

    Description

      Here is the faulty sequence:

      • User starts a scheduler job update and pauses while it's still in progress
      • User runs "aurora job cancel-update" command thus releasing the update lock
      • User starts a new scheduler job update

      At this point, any attempt to abort or pause an active update results in the following error [1]:

      vagrant@vagrant-ubuntu-trusty-64:~$ aurora beta-update abort devcluster/www-data/prod/hello
       INFO] Aborting update for: devcluster/www-data/prod/hello
      Failed to abort update due to error:
      	expected one element but was: <JobUpdateSummary(updateId:4b7fdc14-428f-44e4-9261-908b606f47e2, jobKey:JobKey(role:www-data, environment:prod, name:hello), user:UNSECURE, state:JobUpdateState(status:ROLLING_FORWARD, createdTimestampMs:1421450382234, lastModifiedTimestampMs:1421450382234)), JobUpdateSummary(updateId:3c9c2fa2-8e51-4c13-8440-94364205a37b, jobKey:JobKey(role:www-data, environment:prod, name:hello), user:UNSECURE, state:JobUpdateState(status:ROLL_FORWARD_PAUSED, createdTimestampMs:1421450304935, lastModifiedTimestampMs:1421450324055))>
      

      The only way to recover from this state is either wait for the active job update to reach terminal state or force it to it by running another cancel-update.

      While the "cancel-update" will eventually go away with the client updater, we do have a problem during the migration period. A possible (though ugly) short-term workaround could be calling "abortJobUpdate" from the "releaseLock" RPC.

      [1] - https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java#L295-L296

      Attachments

        Activity

          People

            wfarner Bill Farner
            maximk Maxim Khutornenko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: