Details
-
Improvement
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
Description
As discussed briefly on CASSANDRA-6230, it should be quite possible to construct a single log that can serve as commit log, hints log and batch log. The basic idea would be to write sequentially, marking messages as members of one or more logical logs. We have a separate efficient (possibly embedded) ledger for invalidation of log records. As entire log segments become invalidated, we simply delete them; the rest we accumulate until we hit a high watermark, and have segments that are at least half empty, at which point we begin rewriting the emptiest.
This absolutely bounds our worst case sequential IO at 2x that used by just the commit log, with normal operation under sufficiently high watermark having zero overhead. The upper bound for space utilisation is the smaller of 2x the actual amount of data stored, and our high watermark. This gives us batch and hints for free, and eliminates OOMs from hints.