Details
Description
org.apache.activemq.broker.region.cursors.CursorDurableTest fails on windows sometimes with the error:
Exception in thread "Persistence Adaptor Task" java.lang.NullPointerException
at org.apache.activemq.store.amq.AMQMessageStore$4.execute(AMQMessageStore.java:381)
at org.apache.activemq.util.TransactionTemplate.run(TransactionTemplate.java:44)
at org.apache.activemq.store.amq.AMQMessageStore.doAsyncWrite(AMQMessageStore.java:374)
at org.apache.activemq.store.amq.AMQMessageStore.asyncWrite(AMQMessageStore.java:341)
at org.apache.activemq.store.amq.AMQMessageStore$1.iterate(AMQMessageStore.java:95)
at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:122)
at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:43)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 303.672 sec <<< FAILURE!
The problem appears to be in the interaction between wakup and runTask in PooledTaskRunner
iterating is set to false in a finally and queued is checked in a separate sync block.
if wakeup is called in this window, it can set queued and find iterating false so it will execute, and runTask will find queued true and it too will execute.
the fix is to include the queued check in the finally block so that iterating and queued are checked at the same time. I will attach a patch with this fix.
I attempted to reproduce this problem with a unit test but I did not have any real success. the window is quite small. I will include the unit test in case it can be improved upon.
chirino merged a fix yesterday that addresses the symptom of this issue in a different way,
http://svn.apache.org/viewvc?view=rev&revision=650956
The added synchronisation means that parallel calls by the PooledTaskRunner.asyncWrite are serialised on the method access.
This fix addresses the route cause and can negate the need for the synchronisation.
fyi: In the test, the paralell calls can come from flush() and from the asyncWrite task.