Details
-
Improvement
-
Status: Open
-
Low
-
Resolution: Unresolved
-
None
Description
When we merge a partition from two sources and it turns out that those 2 sources don't overlap for that partition, we still end up doing one comparison by row in the first source. However, if we can prove that the 2 sources don't overlap, for example by using the sstable min/max clustering values that we store, we could speed this up. Note that it practice it's little bit more hairy because we need to deal with N sources, but that's probably not too hard either.
I'll note that using the sstable min/max clustering values is not terribly precise. We could do better if we were to push the same reasoning inside the merge iterator, by for instance using the sstable per-partition index, which can in theory tell use things like "don't bother comparing rows until the end of this row block". This is quite a bit more involved though so maybe note worth the complexity.
Attachments
Issue Links
- is related to
-
CASSANDRA-11697 Improve Compaction Throughput
- Open
- relates to
-
CASSANDRA-8731 Optimise merges involving multiple clustering columns
- Open
-
CASSANDRA-8923 Improve MergeIterator performance for binary prefix comparable data
- Open