Details
Description
Some time ago clients of our ActiveMQ instance locked up while browsing. Analysis of the log files showed a large amount of:
java.util.NoSuchElementException at java.util.LinkedList.remove(LinkedList.java:788) at java.util.LinkedList.removeFirst(LinkedList.java:134) at org.apache.activemq.broker.region.Queue.getNextBrowserDispatch(Queue.java:1341) at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1463) at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:122) at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:43) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:637)
Happening before lockup.
It seems there is a problem in Queue class, which uses non thread safe LinkedList collection. Additions and removals to/from this collection are wrapped by a shared readLock, which means there is no guard against concurrent modification and there is also a possibility of a race condition between isEmpty and removeFirst call during concurrent usages of getNextBrowserDispatch (if they are possible).
I think the easiest fix is to switch from LinkedList to ConcurrentLinkedQueue and make use of Queue methods to access the collection (because they allow single step isEmpty/remove call). I am attaching a patch that does it. I've left the readLocks in case they are used to block writes someplace else, but they are not needed anymore for concurrency control over the new collection.