Details
-
Improvement
-
Status: Open
-
Normal
-
Resolution: Unresolved
Description
There have been many attempts to automate repair in Cassandra, which makes sense given that it is necessary to give our users eventual consistency. Most recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked for ways to solve this problem.
At Netflix we've built a scheduled repair service within Priam (our sidecar), which we spoke about last year at NGCC. Given the positive feedback at NGCC we focussed on getting it production ready and have now been using it in production to repair hundreds of clusters, tens of thousands of nodes, and petabytes of data for the past six months. Also based on feedback at NGCC we have invested effort in figuring out how to integrate this natively into Cassandra rather than open sourcing it as an external service (e.g. in Priam).
As such, vinaykumarcse and I would like to re-work and merge our implementation into Cassandra, and have created a design document showing how we plan to make it happen, including the the user interface.
As we work on the code migration from Priam to Cassandra, any feedback would be greatly appreciated about the interface or v1 implementation features. I have tried to call out in the document features which we explicitly consider future work (as well as a path forward to implement them in the future) because I would very much like to get this done before the 4.0 merge window closes, and to do that I think aggressively pruning scope is going to be a necessity.
Attachments
Attachments
Issue Links
- duplicates
-
CASSANDRA-19918 Automated Repair Inside Cassandra
- Triage Needed
- is related to
-
CASSSIDECAR-24 C* Management process
- Resolved
- relates to
-
CASSANDRA-8911 Consider Mutation-based Repairs
- Open
-
CASSANDRA-10070 Automatic repair scheduling
- Open
- links to