[CASSANDRA-3200] Repair: compare all trees together (for a given range/cf) instead of by pair in isolation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Low
Resolution: Fixed
Fix Version/s: 4.0-alpha1, 4.0
Component/s: None
Labels:
- repair

Description

Currently, repair compare merkle trees by pair, in isolation of any other tree. What that means concretely is that if I have three node A, B and C (RF=3) with A and B in sync, but C having some range r inconsitent with both A and B (since those are consistent), we will do the following transfer of r: A -> C, C -> A, B -> C, C -> B.

The fact that we do both A -> C and C -> A is fine, because we cannot know which one is more to date from A or C. However, the transfer B -> C is useless provided we do A -> C if A and B are in sync. Not doing that transfer will be a 25% improvement in that case. With RF=5 and only one node inconsistent with all the others, that almost a 40% improvement, etc...

Given that this situation of one node not in sync while the others are is probably fairly common (one node died so it is behind), this could be a fair improvement over what is transferred. In the case where we use repair to rebuild completely a node, this will be a dramatic improvement, because it will avoid the rebuilded node to get RF times the data it should get.

Attachments

Issue Links

is duplicated by

CASSANDRA-5972 Reduce the amount of data to be transferred during repair

Open

CASSANDRA-11965 Duplicated effort in repair streaming

Resolved

relates to

CASSANDRA-16274 Improve performance when calculating StreamTasks with optimised streaming

Resolved

Activity

People

Assignee:: Marcus Eriksson

Reporter:: Sylvain Lebresne

Authors:: Marcus Eriksson

Reviewers:: Blake Eggleston

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 13/Sep/11 14:58

Updated:: 22/Mar/22 21:35

Resolved:: 07/Dec/17 12:59