Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1507

Instance collision stack trace in updater

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.11.0
    • Scheduler
    • None

    Description

      I have spent some time investigating this error but still have no reasonable leads. Dumping relevant logs here in case anyone encountered it or has better understanding of what's going on here:

      I0929 01:43:54.700 THREAD4752 org.apache.aurora.scheduler.updater.InstanceActionHandler$AddTask.getReevaluationDelay: Adding instance IInstanceKey{jobKey=IJobKey{role=role, environment=prod, name=job}, instanceId=26} while ROLLING_FORWARD
      
      I0929 01:43:08.902 THREAD5112 org.apache.aurora.common.util.StateMachine$Builder$1.execute: 1443229138618-role-prod-job-13-41a49672-2698-4f82-b976-4e1e8edd2d5b state machine transition KILLING -> KILLED
      I0929 01:43:08.902 THREAD5112 org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup: Adding work command SAVE_STATE for 1443229138618-role-prod-job-13-41a49672-2698-4f82-b976-4e1e8edd2d5b
      
      I0929 01:43:48.543 THREAD4752 org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$6.execute: Forwarding task change for role/prod/job/25
      I0929 01:43:48.558 THREAD4752 org.apache.aurora.scheduler.updater.OneWayJobUpdater.evaluateInstances: Ignoring state change for instance outside working set: 25
      
      I0929 01:43:50.982 THREAD4752 org.apache.aurora.scheduler.storage.mem.MemTaskStore.fetchTasks: Query took 27 ms: TaskQuery(owner:null, role:role, environment:prod, jobName:job, taskIds:null, statuses:[PENDING, DRAINING, ASSIGNED, STARTING, RUNNING, PREEMPTING, RESTARTING, THROTTLED, KILLING], instanceIds:[13], slaveHosts:null, jobKeys:null, offset:0, limit:0)
      I0929 01:43:50.983 THREAD4752 org.apache.aurora.common.util.StateMachine$Builder$1.execute: Instance 13 state machine transition IDLE -> WORKING
      
      I0929 01:43:54.667 THREAD4752 org.apache.aurora.scheduler.updater.InstanceActionHandler$AddTask.getReevaluationDelay: Adding instance IInstanceKey{jobKey=IJobKey{role=role, environment=prod, name=job}, instanceId=13} while ROLLING_FORWARD
      I0929 01:43:54.686 THREAD4752 org.apache.aurora.common.util.StateMachine$Builder$1.execute: 1443491034668-role-prod-job-13-ee1cd4f8-a902-47d2-8aae-fca7d53930b9 state machine transition INIT -> PENDING
      I0929 01:43:54.687 THREAD4752 org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup: Adding work command SAVE_STATE for 1443491034668-role-prod-job-13-ee1cd4f8-a902-47d2-8aae-fca7d53930b9
      
      I0929 01:43:57.058 THREAD4751 org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$6.execute: Forwarding task change for role/prod/job/13
      
      I0929 01:43:58.159 THREAD4751 org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater: Executing side-effects for update of IJobUpdateKey{job=IJobKey{role=role, environment=prod, name=job}, id=3042db11-c6f9-4344-a2c0-7e584a49acc4}: {13=SideEffect{action=Optional.of(ADD_TASK), statusChanges=[]}
      
      I0929 01:43:58.160 THREAD4751 org.apache.aurora.scheduler.updater.InstanceActionHandler$AddTask.getReevaluationDelay: Adding instance IInstanceKey{jobKey=IJobKey{role=role, environment=prod, name=job}, instanceId=13} while ROLLING_FORWARD
      E0929 01:43:58.178 THREAD4751 org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.taskChangedState: Failed to handle state change: java.lang.IllegalArgumentException: Instance ID collision detected.
      java.lang.IllegalArgumentException: Instance ID collision detected.
              at org.apache.aurora.scheduler.state.StateManagerImpl.insertPendingTasks(StateManagerImpl.java:138)
              at org.apache.aurora.scheduler.updater.InstanceActionHandler$AddTask.getReevaluationDelay(InstanceActionHandler.java:85)
              at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater(JobUpdateControllerImpl.java:668)
              at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.access$1400(JobUpdateControllerImpl.java:108)
              at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$6.execute(JobUpdateControllerImpl.java:356)
              at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:137)
              at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:132)
              at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:614)
              at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:611)
              at org.apache.aurora.scheduler.storage.db.DbStorage.transactionedWrite(DbStorage.java:146)
              at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
              at org.apache.aurora.scheduler.storage.db.DbStorage$2.doWithGateClosed(DbStorage.java:162)
              at org.apache.aurora.scheduler.async.GatingDelayExecutor.closeDuring(GatingDelayExecutor.java:62)
              at org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:158)
              at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:84)
              at org.apache.aurora.scheduler.storage.log.LogStorage.doInTransaction(LogStorage.java:611)
              at org.apache.aurora.scheduler.storage.log.LogStorage.write(LogStorage.java:644)
              at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.write(CallOrderEnforcingStorage.java:130)
              at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.instanceChanged(JobUpdateControllerImpl.java:347)
              at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.instanceChangedState(JobUpdateControllerImpl.java:332)
              at org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.taskChangedState(JobUpdateEventSubscriber.java:56)
              at sun.reflect.GeneratedMethodAccessor121.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:497)
              at com.google.common.eventbus.EventSubscriber.handleEvent(EventSubscriber.java:74)
              at com.google.common.eventbus.SynchronizedEventSubscriber.handleEvent(SynchronizedEventSubscriber.java:47)
              at com.google.common.eventbus.EventBus.dispatch(EventBus.java:322)
              at com.google.common.eventbus.AsyncEventBus.access$001(AsyncEventBus.java:34)
              at com.google.common.eventbus.AsyncEventBus$1.run(AsyncEventBus.java:117)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        Activity

          People

            maximk Maxim Khutornenko
            maximk Maxim Khutornenko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: