Description
am experiencing a repeated errors for workflows that were already killed.
012-11-05 21:01:27,975 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000713-121031144215857-oozie-isot-W] ACTION[-] Acquired lock for [0000712-121031144215857-oozie-isot-W] in [kill] 2012-11-05 21:01:27,975 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000713-121031144215857-oozie-isot-W] ACTION[-] Load state for [0000712-121031144215857-oozie-isot-W] 2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Precondition check for command [kill] key [0000712-121031144215857-oozie-isot-W] 2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Released lock for [0000712-121031144215857-oozie-isot-W] in [kill] 2012-11-05 21:01:28,475 WARN KillXCommand:542 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] E0725: Workflow instance can not be killed, 0000712-121031144215857-oozie-isot-W, Error Code: E0725 2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Command [kill] key [0000712-121031144215857-oozie-isot-W] already executed for [org.apache.oozie.command.wf.KillXCommand@6903ada0] 2012-11-05 21:01:28,475 ERROR KillXCommand:536 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Exception, java.lang.IllegalMonitorStateException at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:374) at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:1131) at org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59) at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222) at org.apache.oozie.command.XCommand.call(XCommand.java:295) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 2012-11-05 21:01:28,475 WARN CallableQueueService$CompositeCallable:542 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable [kill], E0607: Other error in operation [java.lang.IllegalMonitorStateException], {1} org.apache.oozie.command.CommandException: E0607: Other error in operation [java.lang.IllegalMonitorStateException], {1} at org.apache.oozie.command.XCommand.call(XCommand.java:317) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.IllegalMonitorStateException at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:374) at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:1131) at org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59) at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222) at org.apache.oozie.command.XCommand.call(XCommand.java:295) ... 6 more
What I did, is some time ago I killed a coordinator app whic had a lot of workflows in WAITING state, since then I see this error in the log file.
The log/exception is shown periodically, and it happen for the same job id (0000712-121031144215857-oozie-isot-W) and also several other job ids.
According to Mohammad Islam:
This is definitely a bug in XCommand class.
Most possibly, the bug was introduced when interrupt type command was added.
Would you please create a JIRA for this adding the contents of this email?
Probable reason:
Trying to release a lock w/o locking it first. It could happen on special case.
The bug is here in XCommand code:
finally{ if (isLockRequired()) { releaseLock(); } }
The "if" condition should be changed to something like this (used before locking):
if (isLockRequired() && !this.inInterruptMode())