Today messagev2 metadata table is the largest table of our schema (11GB on 6 million messages instalation).
Reads to it are abnormally long, and this is the query we spend the more time executing while doing IMAP:
(see attached screenshot)
That is by far our most expensive query (per row), and that while LWT do not even come into play!
Looking at table stats:
Doing a nodetool status, we realize that total occupied space in Cassandra is of `4.97 GiB`, so messagesv2 table occuppy 40% of total storage space which is a lot compared to other messages metadata tables (~180MB so 10x less).
Doing a tablehistograms analysis:
We realize that cell count is high, and that the byte count is high (other message metadata table are between 125 -> 250 bytes so 6 times less).
Knowing our data model, each message have a set of properties, each composed of a namespace, a name and a value. These are stored as a UDT list, thus is really space inefficient. Compress ratio (see above) do not compensate for this.
These properties are well defined, set by the StoreMailboxManager and only include:
Here is the conclusion:
- Slow reads on messagev2 table have a large impact on IMAP performance (and on Cassandra performances)
- This slowness is due to the corresponding space used on disk (more data = slower reads)
- This extra space is due to an inefficient storage format of the property fields
- By restructuring the way we store these properties we can reclaim disk space and thus query read speed
- Remove unused properties: see https://github.com/linagora/james-project/pull/3925 (charset and mime part delimiters are unused)
- Restructure CassandraMessageDAO (and underlying table) to store known properties in a column instead of a UDT list. Unknown properties shall be rejected.
This avoids the use of a collection on a critical table thus should significantly fasten related operations.
A data migration (messagev3) will be needed.
Definition of done
- Come up with space occupied per message update within tests for the old and the new
- IMAP performance tests