How you think Todd? Because of the tiering cost more or is it something to do w/ mslab allocations?
Extra container costs with the fact that we have extra CSLM objects for each row. I haven't measured but I bet there is some map-wide overhead that we're paying.
There are some other things I noticed that could be improved, though. In particular, CSLM optimizes for Comparable keys, so if you specify a custom comparator, then it has to wrap every key you insert with a wrapper object. Specializing CSLM for our purposes would easily save 64 bytes per entry on this.
Another thought I had was to do the following:
- have the actual entries in rowMap be Object-typed, rather than CSLMs.
- when the first insert happens, just insert the KeyValue itself (optimization for the case where each row has only one cell)
- when more inserts happen, swap it out for a proper container type
The proper container type's also interesting to consider here. We never have contention on update within a row, since the updates happen under a row lock, right? So, we can consider any map type that supports single-writer multiple-reader efficiently, which is a wider range of data structures than support multi-writer multi-reader. One possibility is snap trees or even copy-on-write sorted array lists.
What would you like to see test-wise proving this direction better than what we currently have? I could work up some tests?
Would be great if we had a benchmark focused on memstore-only which allowed a mix of the following operations from different threads:
- full scans
- range scans
- updates to existing rows which just touch 1 or a few columns
- updates to existing rows which touch lots of columns
- inserts of new rows (few or lots of columns)
But it's a bit of work to do all that. So, a microbenchmark which just timed something like having 20 threads each do a bunch of inserts with multi-column rows would at least show whether there's promise here.