Details
-
New Feature
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
Description
We (Benedict, Ariel and me) had some offline discussion about the next steps to further improve the row cache committed for CASSANDRA-7438 and identified the following points.
This ticket is basically a "note" not to forget these topics. The individual points should be handled in separate (sub) tickets.
- Permit access to off-heap data without deserialization. This should be the biggest win to improve reads - effectively no more deserialization of the whole cached value from off-heap. OHC issue #2
- Per-table-knob that decides whether changes are updated in the row cache on writes or not. Could be a win if you have a workload with frequent reads against a few "hot" partitions but write to many other partitions. Otherwise the row cache would fill up with useless data and effectively reduce cache hit ratio.
- Update cassandra.sh to preload jemalloc using LD_PRELOAD / DYLD_INSERT_LIBRARIES and use Unsafe for memory allocation. This removes JNA from the call stack. Additionally we should do this change in existing C* code for the same reason. (Note: JNA adds some overhead and has a synchronized block in each call going to be fixed in a future version - but it's not for free.) Feels like a LHF.
- Investigate whether key cache and counter cache can also use OHC. We could iterate towards a single cache implementation and maybe remove some code and decrease the potential number of configurations that can be run.
- Investigate whether RowCacheSentinel can be replaced with something better / "more native". RowCacheSentinel's reason seems to be to avoid races with other update operations that would invalidate the row before it is inserted into the cache. It's a workaround for it not being write-through.
- Implement efficient off-heap memory allocator. (see below)
Not big wins:
- Allow serialization of hot keys during auto save. Since saving of cached keys is a task that only runs infrequently (if at all), the win would not be great. It feels like LHF, but the win is low iMO.
- Use other replacement strategy. We had some discussion about using something else instead of LRU (timestamp, 2Q, LIRS, LRU+random). But either the overhead to manage these strategies overwhelm the benefit or the win would be to low.
LHFs (should be fixed in the next days)
- don't use row cache in unit tests (currently enabled in test/conf/cassandra.yaml)
- don't print whole class path when jemalloc is not available (prints >40k class path on cassci for each unit text, since jemalloc is not available there - related to previous point)
As to incorporating memory management, I think we can actually do this very simply by merging it with our eviction strategy. If we allocate S arenas of 1/S (where S is the number of Segments), and partition each arena into pages of size K, we can make our eviction strategy operate over whole pages, instead of individual items. This probably won't have any significant impact on eviction, especially with small-ish pages. The only slight complexity is dealing with allocations spanning multiple pages, but that shouldn't be too tricky. The nice thing about this approach is that, like our other decisions, it is very easily made obviously correct. It also gives us great locality for operations, with a high likelihood of cache presence for each allocation.
Attachments
1.
|
Row cache compression | Open | Unassigned |