Couldn't we do that the same way we did with compression options? I'm happy to make it a sub-task, I just want the main code to be settled before starting with that.
Is that going to have the same use case as it did per-CF? Meaning we would be saving a top of the cache and it doesn't guarantee that system doesn't start almost cold...
Yes, it should really do exactly the same thing that the old option, except being global.
Do you think that it worse the effort of maintaining (also persisting) such descriptor -> id relationship exclusively for key cache? Meaning it's already very compact cache e.g. even with descriptor > 50 bytes we would need ~20 mb to store 200000 keys...
The thing is that 200000 keys is not necessary huge (especially given you can have more key cache entry than the total number of your keys since there is an entry per-sstable). And 50 bytes for each filename is also not even a worst case at all, especially when we have
CASSANDRA-2749. And with say 1M keys, if each filename is 100 bytes, add the actual keys to that, we're talking > 100MB. Without being huge, it's a noticeable wast of I/O when the cache could easily be 10 times smaller. And if we add the values it will be worth.
There is also the fact that I would be ashamed to have to explain to user that we save those full path to sstable with each entry when they complain that the key cache on disk is more than 10 times bigger that max size they configured in the yaml file.
At the very least, one easy win would be to save only the keyspace, columnFamily, version and generation part of the filename, rather than the whole path to the sstable. But otherwise, when I talked about a descriptor -> id relationship, I was thinking of something simple. Like saving two files instead of on, one would be the keys with the descriptor replaced by compact ids, the other would be the metadata, i.e, the descriptor -> id map. That would really just be some internal detail of the save function. But that's really just an idea.
We do that because CLHM only allows to measure values, to do something about it we would need to re-write Weighter interface and change core semantics of CLHM...
Yeah, I know . But for the key cache, we use a constant weighter, counting 8 bytes for each "entry". Figured we could use some higher constant to get closer to the actual size taken by each entry in-memory, even if we don't account for the exact size of the key. Typically, the KeyCacheKey structure will take "at least" 32 bytes in memory (it's more than that but given there is at least the DK token and a bunch of pointers...), so typically if we were to consider each entry to be like 40 or 48 bytes, I think we would be closer to the actual in-memory size. I just want to avoid people configuring 100MB for the key cache (ok, that would be a huge one) and actually having it being more like 1GB.
Another option would be to reuse the technique used to measure memtables, but I'm fine leaving that to another ticket.
But - save values to avoid 'two phrase' key cache loading - would require to use a common interface for values in key/row caches with serialize/deserialize functionality which is not suitable e.g. for ColumnFamily that we store in row cache... That is why we still rely on SSTableReader.load I think, saving values would limit flexibility of the cache interface...
I fail to see what is so crazy about having the function that saves the cache having access to both key and value. It may require a bit of refactoring, but I don't see that as a good argument. Anyway, it's not a very big deal but I still think that the two phase loading is more fragile than it needs, and saving values would allow a proper reload.
This would mean that we will be caching even secondary index CFs which is, as was said, is not desired.
I don't understand. I'm just saying that a cfs object already has a reference to its metadata, so it's slightly cleaner to use that rather that do query to Schema.instance using the table and column family name.
Otherwise, forget to say that the patch adds the following useless line in
/** Lock to allow migrations to block all flushing, so we can be sure not to write orphaned data files */
public final Lock flushLock = new ReentrantLock();