[CASSANDRA-2677] Optimize streaming to be single-pass - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Low
Resolution: Fixed
Fix Version/s: 1.0.0
Component/s: None
Labels:
None

Description

Streaming currently is a two-pass operation: one to write the Data component do disk from the socket, then another to build the index and bloom filter from it. This means we do about 2x the i/o we would if we created the index and BF during the original write.

For node movement this was not considered to be a Big Deal because the stream target is not a member of the ring, so we can be inefficient without hurting live queries. But optimizing node movement to not require un/rebootstrap (~~CASSANDRA-1427~~) and bulk load (~~CASSANDRA-1278~~) mean we can stream to live nodes too.

The main obstacle here is we don't know how many keys will be in the new sstable ahead of time, which we need to size the bloom filter correctly. We can solve this by including that information (or a close approximation) in the stream setup – the source node can calculate that without hitting disk from the in-memory index summary.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--trunk-2677.txt
13/Jul/11 14:47
39 kB
Yuki Morishita

Activity

People

Assignee:: Yuki Morishita

Reporter:: Jonathan Ellis

Authors:: Yuki Morishita

Reviewers:: Jonathan Ellis

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 20/May/11 19:58

Updated:: 16/Apr/19 09:32

Resolved:: 13/Jul/11 16:26