Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1506

Job updater does not handle DELETED tasks properly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.11.0
    • Scheduler
    • None
    • Twitter Aurora Q4'15 Sprint 16
    • 5

    Description

      We have observed cases when job updater fails to correctly process tasks that were deleted from the store (e.g. PENDING or THROTTLED being killed). Sample log:

      I0929 00:01:59.101 THREAD4453 org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$6.execute: Forwarding task change for role/staging/job/1
      I0929 00:01:59.103 THREAD4453 org.apache.aurora.scheduler.updater.InstanceUpdater.handleActualAndDesiredPresent: Task is in terminal state FAILED
      I0929 00:01:59.103 THREAD4453 org.apache.aurora.scheduler.updater.InstanceUpdater.addFailureAndCheckIfFailed: Observed updated task failure.
      I0929 00:01:59.103 THREAD4453 org.apache.aurora.common.util.StateMachine$Builder$1.execute: Instance 1 state machine transition WORKING -> FAILED
      I0929 00:01:59.103 THREAD4453 org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater: IJobUpdateKey{job=IJobKey{role=role, environment=staging, name=job}, id=38889720-36e4-4640-99c5-399ebf3a68a0} evaluation result: EvaluationResult{status=FAILED, sideEffects={1=SideEffect{action=Optional.absent(), statusChanges=[FAILED]}}}
      I0929 00:01:59.103 THREAD4453 org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.changeJobUpdateStatus: Update IJobUpdateKey{job=IJobKey{role=role, environment=staging, name=job}, id=38889720-36e4-4640-99c5-399ebf3a68a0} is now in state ROLLING_BACK
      I0929 00:01:59.112 THREAD4453 org.apache.aurora.common.util.StateMachine$Builder$1.execute: Instance 1 state machine transition IDLE -> WORKING
      I0929 00:01:59.112 THREAD4453 org.apache.aurora.scheduler.updater.OneWayJobUpdater.startNextInstanceGroup: Changed working set for update to [1]
      I0929 00:01:59.112 THREAD4453 org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater: IJobUpdateKey{job=IJobKey{role=role, environment=staging, name=job}, id=38889720-36e4-4640-99c5-399ebf3a68a0} evaluation result: EvaluationResult{status=WORKING, sideEffects={1=SideEffect{action=Optional.of(KILL_TASK), statusChanges=[WORKING]}}}
      I0929 00:01:59.113 THREAD4453 org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater: Executing side-effects for update of IJobUpdateKey{job=IJobKey{role=role, environment=staging, name=job}, id=38889720-36e4-4640-99c5-399ebf3a68a0}: {1=SideEffect{action=Optional.of(KILL_TASK), statusChanges=[WORKING]}}
      I0929 00:01:59.113 THREAD4453 org.apache.aurora.scheduler.updater.InstanceActionHandler$KillTask.getReevaluationDelay: Killing IInstanceKey{jobKey=IJobKey{role=role, environment=staging, name=job}, instanceId=1} while ROLLING_BACK
      I0929 00:01:59.113 THREAD4453 org.apache.aurora.common.util.StateMachine$Builder$1.execute: 1443484918618-role-staging-job-1-5d1301fb-5bd8-4d51-b4b9-fd9115d766f0 state machine transition THROTTLED -> KILLING
      I0929 00:01:59.113 THREAD4453 org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup: Adding work command DELETE for 1443484918618-role-staging-job-1-5d1301fb-5bd8-4d51-b4b9-fd9115d766f0
      

      Later on, when a task change event is received, the updater attempts to run evaluator again but this time can't find a task that has just being DELETED:

      I0929 00:01:59.382 THREAD4454 org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$6.execute: Forwarding task change for role/staging/job/1
      I0929 00:01:59.385 THREAD4454 org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater: IJobUpdateKey{job=IJobKey{role=role, environment=staging, name=job}, id=38889720-36e4-4640-99c5-399ebf3a68a0} evaluation result: EvaluationResult{status=WORKING, sideEffects={1=SideEffect{action=Optional.of(KILL_TASK), statusChanges=[]}}}
      I0929 00:01:59.385 THREAD4454 org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater: Executing side-effects for update of IJobUpdateKey{job=IJobKey{role=role, environment=staging, name=job}, id=38889720-36e4-4640-99c5-399ebf3a68a0}: {1=SideEffect{action=Optional.of(KILL_TASK), statusChanges=[]}}
      E0929 00:01:59.386 THREAD4454 org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.taskChangedState: Failed to handle state change: java.util.NoSuchElementException
      java.util.NoSuchElementException
              at com.google.common.collect.Iterators$1.next(Iterators.java:80)
              at com.google.common.collect.Iterators.getOnlyElement(Iterators.java:302)
              at com.google.common.collect.Iterables.getOnlyElement(Iterables.java:289)
              at org.apache.aurora.scheduler.updater.InstanceActionHandler$KillTask.getReevaluationDelay(InstanceActionHandler.java:104)
              at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater(JobUpdateControllerImpl.java:668)
              at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.access$1400(JobUpdateControllerImpl.java:108)
              at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$6.execute(JobUpdateControllerImpl.java:356)
              at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:137)
              at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:132)
              at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:614)
              at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:611)
              at org.apache.aurora.scheduler.storage.db.DbStorage.transactionedWrite(DbStorage.java:146)
              at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
              at org.apache.aurora.scheduler.storage.db.DbStorage$2.doWithGateClosed(DbStorage.java:162)
              at org.apache.aurora.scheduler.async.GatingDelayExecutor.closeDuring(GatingDelayExecutor.java:62)
              at org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:158)
              at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:84)
              at org.apache.aurora.scheduler.storage.log.LogStorage.doInTransaction(LogStorage.java:611)
              at org.apache.aurora.scheduler.storage.log.LogStorage.write(LogStorage.java:644)
              at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.write(CallOrderEnforcingStorage.java:130)
              at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.instanceChanged(JobUpdateControllerImpl.java:347)
              at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.instanceChangedState(JobUpdateControllerImpl.java:332)
              at org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.taskChangedState(JobUpdateEventSubscriber.java:56)
              at sun.reflect.GeneratedMethodAccessor121.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:497)
              at com.google.common.eventbus.EventSubscriber.handleEvent(EventSubscriber.java:74)
              at com.google.common.eventbus.SynchronizedEventSubscriber.handleEvent(SynchronizedEventSubscriber.java:47)
              at com.google.common.eventbus.EventBus.dispatch(EventBus.java:322)
              at com.google.common.eventbus.AsyncEventBus.access$001(AsyncEventBus.java:34)
              at com.google.common.eventbus.AsyncEventBus$1.run(AsyncEventBus.java:117)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        Activity

          People

            maximk Maxim Khutornenko
            maximk Maxim Khutornenko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: