One of our applications using Apollo seems to trigger a very severe bug inside the broker.
The application is very simple: a few short lived producers that send a few messages to a single topic, disconnect and start again after some time, plus one or two long lived consumers using durable subscriptions. All this works fine initially but after some time (15 to 30 minutes, it depends), this stops working and the sent messages are not received anymore.
On the surface, the broker seems to be working fine and other clients do work. However, the console reports weird statistics for the topic: the enqueued and dispatched counters do grow while dequeued does not change. Worse: the broker is in a bizarre state and cannot be stopped cleanly ("service apollo stop" yields to "Could not stop process PID") and only "kill -9" can get rid of it.
I've tried to reproduce the problem with simple scripts and I could not. I suspect a concurrency problem between the concurrent producers and consumers.
I will attach a stack dump of the broker when it is in this weird state. If this is not enough, we can give you access to the broker the next time this happens.