[CASSANDRA-17342] Performance problem for node restart with incremental range repairs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 4.0.3, 4.1-alpha1, 4.1
Component/s: Consistency/Repair
Labels:
None

Bug Category:
Degradation - Performance Bug/Regression
Severity:
Normal
Complexity:
Normal
Discovered By:
User Report
Platform:

All
Impacts:

None
Since Version:

4.0.0
Source Control Link:

https://github.com/apache/cassandra/commit/c60ad61b3b6145af100578f2c652819f61729018
Test and Documentation Plan:

Hide

run CI

Show
run CI

Description

There is a performance problem when restarting cassandra for clusters doing incremental repairs with range repairs.

We have clusters with 16 vnodes per node, and are splitting each vnode into 100 ranges, this causes a node to take over 30 minutes to process the data stored in the system.repairs table before the node can restart. Even when we reduce this to 10 ranges per vnode this still takes 2 minutes to process. The cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in the system.repairs table.

The problem seems to occur in the org.apache.cassandra.repair.consistent.RepairState class where the add method re processes the complete list, including sorting, every time a new Range is added. This leads is an exponential growth in processing time, this is demonstrated in the attached unit test.

I have created a change, that collects the data read in from the system.repairs table, in the org.apache.cassandra.repair.consistent.LocalSessions class, before processing it as a group at the end, this reduces the processing time to a couple of seconds even for the 100 range version.

This is my first attempt at changing the cassandra code, so I am in need of a mentor to help me with the process, and validate what I have done.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

RepairedState.java
02/Feb/22 21:02
12 kB
Paul Chandler
BulkRepairStateTest.java
02/Feb/22 21:02
5 kB
Paul Chandler
LocalSessions.java
02/Feb/22 21:02
42 kB
Paul Chandler
IncrementalRepairStartupTest.java
02/Feb/22 18:17
1 kB
Paul Chandler

Activity

People

Assignee:: Paul Chandler

Reporter:: Paul Chandler

Authors:: Paul Chandler

Reviewers:: Brandon Williams, Marcus Eriksson

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 02/Feb/22 18:18

Updated:: 27/May/22 19:25

Resolved:: 11/Feb/22 13:17