Details
Description
We observed the symptom of a job update that was stuck in ROLLING_FORWARD, but was not active. Looking in scheduler logs revealed a DB-level deadlock:
E0715 19:04:53.085 THREAD1471 org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.startUp: Failed to resume job updates: org.apache.aurora.scheduler.storage.Storage$StorageException: ### Error querying database. Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: " Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive). Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement: SELECT lock_token FROM job_update_locks AS l INNER JOIN job_updates u ON l.update_row_id = u.id INNER JOIN job_keys AS j ON j.id = u.job_key_id WHERE u.update_id = ? AND j.role = ? AND j.environment = ? AND j.name = ? [40001-187] ### The error may exist in org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml ### The error may involve defaultParameterMap ### The error occurred while setting parameters ### SQL: SELECT lock_token FROM job_update_locks AS l INNER JOIN job_updates u ON l.update_row_id = u.id INNER JOIN job_keys AS j ON j.id = u.job_key_id WHERE u.update_id = ? AND j.role = ? AND j.environment = ? AND j.name = ? ### Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: " Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive). Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement: SELECT lock_token FROM job_update_locks AS l INNER JOIN job_updates u ON l.update_row_id = u.id INNER JOIN job_keys AS j ON j.id = u.job_key_id WHERE u.update_id = ? AND j.role = ? AND j.environment = ? AND j.name = ? [40001-187] org.apache.aurora.scheduler.storage.Storage$StorageException: ### Error querying database. Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: " Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive). Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement: SELECT lock_token FROM job_update_locks AS l INNER JOIN job_updates u ON l.update_row_id = u.id INNER JOIN job_keys AS j ON j.id = u.job_key_id WHERE u.update_id = ? AND j.role = ? AND j.environment = ? AND j.name = ? [40001-187] ### The error may exist in org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml ### The error may involve defaultParameterMap ### The error occurred while setting parameters ### SQL: SELECT lock_token FROM job_update_locks AS l INNER JOIN job_updates u ON l.update_row_id = u.id INNER JOIN job_keys AS j ON j.id = u.job_key_id WHERE u.update_id = ? AND j.role = ? AND j.environment = ? AND j.name = ? ### Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: " Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive). Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement: SELECT lock_token FROM job_update_locks AS l INNER JOIN job_updates u ON l.update_row_id = u.id INNER JOIN job_keys AS j ON j.id = u.job_key_id WHERE u.update_id = ? AND j.role = ? AND j.environment = ? AND j.name = ? [40001-187] at org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:145) at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) at com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87) at org.apache.aurora.scheduler.storage.log.LogStorage.doInTransaction(LogStorage.java:602) at org.apache.aurora.scheduler.storage.log.LogStorage.write(LogStorage.java:635) at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.write(CallOrderEnforcingStorage.java:130) at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume(JobUpdateControllerImpl.java:263) at org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.startUp(JobUpdateEventSubscriber.java:85) at com.google.common.util.concurrent.AbstractIdleService$2$1.run(AbstractIdleService.java:54) at com.google.common.util.concurrent.Callables$3.run(Callables.java:93) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.ibatis.exceptions.PersistenceException: ### Error querying database. Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: " Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive). Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement: SELECT lock_token FROM job_update_locks AS l INNER JOIN job_updates u ON l.update_row_id = u.id INNER JOIN job_keys AS j ON j.id = u.job_key_id WHERE u.update_id = ? AND j.role = ? AND j.environment = ? AND j.name = ? [40001-187] ### The error may exist in org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml ### The error may involve defaultParameterMap ### The error occurred while setting parameters ### SQL: SELECT lock_token FROM job_update_locks AS l INNER JOIN job_updates u ON l.update_row_id = u.id INNER JOIN job_keys AS j ON j.id = u.job_key_id WHERE u.update_id = ? AND j.role = ? AND j.environment = ? AND j.name = ? ### Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: " Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive). Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement: SELECT lock_token FROM job_update_locks AS l INNER JOIN job_updates u ON l.update_row_id = u.id INNER JOIN job_keys AS j ON j.id = u.job_key_id WHERE u.update_id = ? AND j.role = ? AND j.environment = ? AND j.name = ? [40001-187] at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:122) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:113) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:73) at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.ibatis.session.SqlSessionManager$SqlSessionInterceptor.invoke(SqlSessionManager.java:334) at com.sun.proxy.$Proxy57.selectOne(Unknown Source) at org.apache.ibatis.session.SqlSessionManager.selectOne(SqlSessionManager.java:165) at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:69) at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:53) at org.apache.aurora.scheduler.storage.db.$Proxy64.selectLockToken(Unknown Source) at org.apache.aurora.scheduler.storage.db.DbJobUpdateStore.getLockToken(DbJobUpdateStore.java:257) at com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87) at org.apache.aurora.scheduler.storage.ForwardingStore.getLockToken(ForwardingStore.java:178) at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.changeJobUpdateStatus(JobUpdateControllerImpl.java:454) at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.access$1000(JobUpdateControllerImpl.java:108) at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$4.execute(JobUpdateControllerImpl.java:283) at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:131) at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult$Quiet.apply(Storage.java:148) at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:605) at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:602) at org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:143) ... 10 more Caused by: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: " Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive). Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement: SELECT lock_token FROM job_update_locks AS l INNER JOIN job_updates u ON l.update_row_id = u.id INNER JOIN job_keys AS j ON j.id = u.job_key_id WHERE u.update_id = ? AND j.role = ? AND j.environment = ? AND j.name = ? [40001-187] at org.h2.message.DbException.getJdbcSQLException(DbException.java:345) at org.h2.message.DbException.get(DbException.java:179) at org.h2.message.DbException.get(DbException.java:155) at org.h2.table.RegularTable.doLock1(RegularTable.java:496) at org.h2.table.RegularTable.lock(RegularTable.java:472) at org.h2.table.TableFilter.lock(TableFilter.java:146) at org.h2.command.dml.Select.queryWithoutCache(Select.java:671) at org.h2.command.dml.Query.query(Query.java:322) at org.h2.command.dml.Query.query(Query.java:290) at org.h2.command.dml.Query.query(Query.java:36) at org.h2.command.CommandContainer.query(CommandContainer.java:90) at org.h2.command.Command.executeQuery(Command.java:197) at org.h2.jdbc.JdbcPreparedStatement.execute(JdbcPreparedStatement.java:192) at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:62) at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:78) at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:62) at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:303) at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:154) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:102) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:82) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:120) ... 32 more
Attachments
Issue Links
- blocks
-
AURORA-1078 0.9.0 Release Candidate
- Resolved