In any case, given that leaders are only randomly killed, wouldn't the first be a superset of the other?
One super set test makes it very hard to debug and fix issues. I actually run variations of both of these tests on my jenkins when hunting down failures so that I can narrow down what behavior things fail under. I'd have a lot more of them focused on more subsets, but even these 2 get so little time that it's just not worth it yet. Trying to separate out leader election at the high level has proved very helpful so far though.
Anyway, when the safe leader test fails and the leader kill test is not failing, you can bet you get to just focus on the recovery from leader path. When leaders go down in these tests, it's also many times hard to catch an issue as the leader sync sequence can repair and hide problems.
The fails can be so infrequent, to hunt them you need either / or a test beasting script and jenkins running just the chaosmonkey tests. I run them in a few variations, nightly and regular. When my local jenkins machine is up and running that is.