[CASSANDRA-13797] RepairJob blocks on syncTasks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 3.0.15, 3.11.1, 4.0-alpha1, 4.0
Component/s: Consistency/Repair
Labels:
None

Severity:
Normal

Description

The thread running RepairJob blocks while it waits for the validations it starts to complete (see here). However, the downstream callbacks (ie: the post-repair cleanup stuff) aren't waiting for RepairJob#run to return, they're waiting for a result to be set on RepairJob the future, which happens after the sync tasks have completed. This post repair cleanup stuff also immediately shuts down the executor RepairJob#run is running in. So in noop repair sessions, where there's nothing to stream, I'm seeing the callbacks sometimes fire before RepairJob#run wakes up, and causing an InterruptedException is thrown.

I'm pretty sure this can just be removed, but I'd like a second opinion. This appears to just be a holdover from before repair coordination became async. I thought it might be doing some throttling by blocking, but each repair session gets it's own executor, and validation is throttled by the fixed size executors doing the actual work of validation, so I don't think we need to keep this around.

Attachments

Issue Links

causes

CASSANDRA-14332 Fix unbounded validation compactions on repair

Resolved

fixes

CASSANDRA-15902 OOM because repair session thread not closed when terminating repair

Resolved

is duplicated by

CASSANDRA-13555 Thread leak during repair

Resolved

Activity

People

Assignee:: Blake Eggleston

Reporter:: Blake Eggleston

Authors:: Blake Eggleston

Reviewers:: Marcus Eriksson

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 24/Aug/17 22:15

Updated:: 03/Sep/20 11:56

Resolved:: 14/Mar/18 12:56