Details
Description
https://www.mail-archive.com/server-dev@james.apache.org/msg70444.html
While auditing a slow Cassandra on a performance test environment, I
noticed ~25% of the data to be garbage of the Cassandra projection for
the RabbitMQ mailqueue, as the following tables stats demonstrates:
Table: enqueuedmailsv3
SSTable count: 327
Space used (live): 4962189078
Space used (total): 4962189078
Space used by snapshots (total): 0
Off heap memory used (total): 4716757
SSTable Compression Ratio: 0.33271449206498704
Number of partitions (estimate): 6246
Table: deletedmailsv2
SSTable count: 69
Space used (live): 1132247647
Space used (total): 1132247647
Space used by snapshots (total): 0
Off heap memory used (total): 28743224
SSTable Compression Ratio: 0.5380381348994696
Number of partitions (estimate): 17669157
We take up to 6 GB for an empty mail queue. A bit of cleanup would be
welcome.
The following document presents the design of the RabbitMQ mailqueue:
https://github.com/apache/james-project/blob/master/src/adr/0031-distributed-mail-queue.md
The following document presents the design that solves that the
aforementioned issue but was sadly never implemented...
https://github.com/apache/james-project/blob/master/src/adr/0032-distributed-mail-queue-cleanup.md
This also means people having dedplication turned off never deletes
associated blobs.
I will fire a PR updating the status of this ADR. This ADR will end up
on Linagora's short-middle term TODO list.
Cheers,
Benoit