The commutative properties of XOR make it possible to update the MT incrementally without having to read on write
Hang on, let's flesh this out.
I have an md5 hash (or part of one, see below) per row in a MerkleTree TreeRange. I xor all these together to get my initial state, S. To update row A to row A', I need to take S xor hash(A) xor hash(A').
So I still need to read-on-write to compute hash(A), I just don't have to rehash everything else in the same TreeRange.
(I can imagine breaking this down into xoring individual columns, which would mean we would only need to read modified columns and not the entire row, but the principle is the same.)
For num_tokens=256, that's 1 KB per range on average
I see, you mean vnode ranges. What I meant was MT TreeRanges... a MT can have 64k TR. Ideally you will have 16 bytes (md5 size) per TR. You can throw away some bytes at the cost of false negatives, i.e., with a single byte per TR, two replicas will think they have the same data even when they do not 1/256 of the time.
But if you have 64k 1-byte treeranges, how do you fit that into 1KB? Do you reduce the TR granularity further? 64k already feels too low... although this is mitigated somewhat by vnodes.
do have to reload $num_tokens ByteBuffers when creating the ColumnFamilyStore
And sync the BB saving with CF flushes so CL replay matches up, I imagine.