Details
-
Improvement
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
Description
I see a potential OOM, when a stream (e.g. repair) goes through the write path as it is with MVs.
StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators and they again produce mutations. So every partition creates a single mutation, which in case of (very) big partitions can result in (very) big mutations. Those are created on heap and stay there until they finished processing.
I don't think it is necessary to create a single mutation for each partition. Why don't we implement a PartitionUpdateGeneratorIterator that takes a UnfilteredRowIterator and a max size and spits out PartitionUpdates to be used to create and apply mutations?
The max size should be something like min(reasonable_absolute_max_size, max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size could be like 16M or sth.
A mutation shouldn't be too large as it also affects MV partition locking. The longer a MV partition is locked during a stream, the higher chances are that WTE's occur during streams.
I could also imagine that a max number of updates per mutation regardless of size in bytes could make sense to avoid lock contention.
Love to get feedback and suggestions, incl. naming suggestions.
Attachments
Issue Links
- is related to
-
CASSANDRA-14239 OutOfMemoryError when bootstrapping with less than 100GB RAM
- Open
-
CASSANDRA-12905 Retry acquire MV lock on failure instead of throwing WTE on streaming
- Resolved
-
CASSANDRA-11670 Rebuilding or streaming MV generates mutations larger than max_mutation_size_in_kb
- Resolved
-
CASSANDRA-13787 RangeTombstoneMarker and PartitionDeletion is not properly included in MV
- Resolved
- relates to
-
CASSANDRA-12268 Make MV Index creation robust for wide referent rows
- Resolved