[CASSANDRA-8906] Experiment with optimizing partition merging when we can prove that some sources don't overlap - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Low
Resolution: Unresolved
Fix Version/s: None
Component/s: Legacy/Core
Labels:
- compaction
- performance

Description

When we merge a partition from two sources and it turns out that those 2 sources don't overlap for that partition, we still end up doing one comparison by row in the first source. However, if we can prove that the 2 sources don't overlap, for example by using the sstable min/max clustering values that we store, we could speed this up. Note that it practice it's little bit more hairy because we need to deal with N sources, but that's probably not too hard either.

I'll note that using the sstable min/max clustering values is not terribly precise. We could do better if we were to push the same reasoning inside the merge iterator, by for instance using the sstable per-partition index, which can in theory tell use things like "don't bother comparing rows until the end of this row block". This is quite a bit more involved though so maybe note worth the complexity.

Attachments

Issue Links

is related to

CASSANDRA-11697 Improve Compaction Throughput

Open

relates to

CASSANDRA-8731 Optimise merges involving multiple clustering columns

Open

CASSANDRA-8923 Improve MergeIterator performance for binary prefix comparable data

Open

Activity

People

Assignee:: Unassigned

Reporter:: Sylvain Lebresne

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 04/Mar/15 16:34

Updated:: 16/Apr/19 09:31