In case it is hard for you to reproduce, here are the relevant statistics obtained using JMX when the unit test hangs after the first batch of 1000 messages are processed:
For the work-items queue (which has two worker thread consumers):
This makes sense - the 1001 enqueue count indicates the message the master has sent to the work-items queue to indicate to the workers to start processing the second batch of 1000 items, but for whatever reason, this message hasn't been dispatched to a worker.
For the two worker subscriptions on this queue, here are their stats:
I can also confirm that all 3 threads (two workers, one master) and waiting in receive(), by dumping the thread stacks:
-locked <0x199c50d8> (a java.lang.Object)
Looking at the numbers, it really looks like a new message has been put into the queue, but hasn't been dispatched.
Is there any more information you need apart from the above and the unit tests provided to squash this issue?