[CASSANDRA-8915] Improve MergeIterator performance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Low
Resolution: Fixed
Fix Version/s: 3.0 alpha 1
Component/s: None
Labels:

Description

The implementation of MergeIterator uses a priority queue and applies a pair of poll+add operations for every item in the resulting sequence. This is quite inefficient as poll necessarily applies at least log N comparisons (up to 2log N), and add often requires another log N, for example in the case where the inputs largely don't overlap (where N is the number of iterators being merged).

This can easily be replaced with a simple custom structure that can perform replacement of the top of the queue in a single step, which will very often complete after a couple of comparisons and in the worst case scenarios will match the complexity of the current implementation.

This should significantly improve merge performance for iterators with limited overlap (e.g. levelled compaction).

Attachments

Issue Links

blocks

CASSANDRA-8180 Optimize disk seek using min/max column name meta data when the LIMIT clause is used

Resolved

relates to

CASSANDRA-8731 Optimise merges involving multiple clustering columns

Open

CASSANDRA-8923 Improve MergeIterator performance for binary prefix comparable data

Open

Activity

People

Assignee:: Branimir Lambov

Reporter:: Branimir Lambov

Authors:: Branimir Lambov

Reviewers:: Benedict Elliott Smith

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 05/Mar/15 10:16

Updated:: 16/Apr/19 09:31

Resolved:: 16/Jul/15 11:45

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.2h