With current Qpid Broker versions (up to 8.0.4 including) when BDB HA cluster is distributed across different DCs, a consumer throughput falls significantly behind the producer . As result, the total system throughput (determined by the consumer) can be very low.
Such behavior is caused by implementation specifics of BDB message store:
- BDB HA transactions are synchronous for the majority of the nodes. (The messaging transaction is committed when majority of the nodes in the cluster acknowledge the transaction on their side)
- The deletion of dequeued messages from the store is synchronous and impacted by the DC latency.
- 2 separate underlying store transactions are used to delete each individual message data. Thus, if message consumption happens in transactional batches of N, the messages are dequeued from the queue in the transacted batch of N messages. However, after message dequeuing, the unreferenced message data is deleted using its own store transactions (one for content and another for metadata). As result for a batch of N, the store runs 1 + N*2 transactions. The N*2 transactions are synchronous and executed one after another. If a latency between data centers is high, it adds a corresponding delay to each store transaction. As result, the message removal takes time. Only after deletion of N messages, the broker sends commit confirmation to the client. The message enqueing works differently.