Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-148

Jobs should be able to set an UpdateConfig with abort_on_failure = True

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.5.0
    • Client
    • None

    Description

      From a user in production at twitter:

      Automatic rollback is an undesirable behavior in large jobs that take many hours to deploy (and many hours to roll back). Once the first few batches go in without issue, I never want an automatic rollback to occur.
      
      Anecdotally, in several months of deploying large (200-2200 instance) jobs, I've never had auto-rollback trigger due to problems in the build after the first batch. I've only seen it trigger due to issues in the infrastructure which is never desirable.
      
      Additionally, should the new instances start failing due to some issue with the new version midway through, it's never desirable (to me) for it to auto-rollback. I will /always/ want a human to investigate before taking any further action.
      
      Given this, I propose we allow a setting in UpdateConfig which disables auto-rollback. Instead of rolling back, the update will abort with an error when max failures is reached.
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            yasumoto Joe Smith
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: