Details
-
Improvement
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
None
-
Challenging
Description
Most of the repair flow is fire and forget, where we send a message, or start a job with a remote component, and then wait for a response of some type, or for the failure detector to tell us a node has died. This leaves several cases where the repair can hang, and operators have to guess about it’s state, and the best course of action. It would help if the state of a given repair could be polled, and possibly cancelled. This is going to involve touching the validation, anti-compaction, and streaming code.
Attachments
Issue Links
- is related to
-
CASSANDRA-15399 Add ability to track state in repair
- Resolved
- relates to
-
CASSANDRA-13480 nodetool repair can hang forever if we lose the notification for the repair completing/failing
- Resolved
-
CASSANDRA-14435 Diag. Events: JMX events
- Resolved