Details
-
Sub-task
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.22
-
None
-
qpid-cpp-server-0.22-42
qpid-cpp-server-linearstore-0.22-42
Description
Pulp is an open source project that uses Qpidd. Pulp has need for a large number of queues 10K+, and these queues need to be durable. When creating a large number of queues (thousands), if you restart qpidd, it won't start. Here is how to reproduce:
1. Install qpid-cpp-server and qpid-cpp-server-store
2. Start qpidd
3. Create a crazy number of unique queues (10K) with durability
4. Restart Qpidd
5. Observe an error message such as the following
Starting Qpid AMQP daemon: Daemon startup failed: Queue pulp.agent.5752dc04-7536-4e5c-b406-a0cd5d9c9119: recoverMessages() failed: jexception 0x0104 RecoveryManager::getFile() threw JERR__FILEIO: File read or write failure. (/var/lib/qpidd/qls/jrnl/pulp.agent.5752dc04-7536-4e5c-b406-a0cd5d9c9119/818fa4b0-3319-4478-b2b0-d2195f90f695.jrnl) (/builddir/build/BUILD/qpid-0.22/cpp/src/qpid/linearstore/MessageStoreImpl.cpp:1004)
Looking at /var/lib/qpidd/qls/jrnl/ directory there is 2676 jrnl files, 2640 of them start with pulp.agent. In our case the lots of queues are named 'pulp.agent.<UUID>'.
The expected behavior is that qpidd would start and run awesome with a crazy number of queues (1 Million +).
Raising the number of file descriptors is a viable workaround, but eventually those will run out too. It would be an architectural win if a constant number of file descriptors were used that are not affected by the number of queues qpidd is managing.
Perhaps this could be introduced as a new journal type that would run slower but be more scalable. It could be introduced as qpid-cpp-server-crazy-scalable-but-slower-store.