I'm sorry but I still think we are still returning the wrong number to the user. To be clear, this is nothing against the code of the patch itself, I just think that given the way repair works, it is not so simple to have a "time since last successful repair".
The "unit" of a repair is for a given keyspace, column family and range. Because of that, I don't think we can return a single "time since last successful repair" for a given keyspace and column family. It has to include the range somehow. Granted, so far a nodetool repair repairs all the ranges of the node you launch it on, but I don't think this should be the case (
CASSANDRA-2610). Moreover, even now, one of the range can fail without the other. So returning only one number for all ranges is wrong.
The other problem is: I'm not convinced that recording the information only on the node coordinating the repair is necessarily super helpful. When you start a repair a node, you will also repair its neighbor (for only the range they share), so recording the time only on the initial node on which the nodetool command was connected is random, and will convey the idea that repair should be started for every range on every node (while I strongly thing that the short term goal should be to make it easy to NOT do that –
Imho, we should hold back on this issue for now and at least wait for
CASSANDRA-2610, CASSANDRA-2606 and CASSANDRA-2816 before committing to anything. I agree that having information to help people plan repair is nice, but it is at most a very minor improvement and exposing a misleading number is more harmful that no number.