Let me go into a little more detail here.
When we were originally talking about Recovery Mode, one big concern we had was that system administrators would overuse Recovery Mode to fix issues that might be better addressed in a different way. Of course, it's impossible to prevent all misuse-- human beings are not perfect, and any tool can be misused. That's the reason why we made recovery mode a startup option, rather than a configuration. It would be too easy for people to set the configuration and then leave it set even after the problem was gone. That's also the reason why an NameNode in RM exits as soon as it has loaded the edit log and written a new FSImage. This was all discussed in
Obviously edit log toleration goes against those assumptions, and in a way that frankly, I think is very dangerous.
Recovery Mode is generally an extensible concept. Since it has nothing to do with the physical structure of the edit log on-disk, it can be extended to handle arbitrary types of corruption. For example, what if you encounter an edit that relies on a directory that doesn't exist (because of corruption earlier in the log)? This is something that recovery mode could conceivably handle by displaying a prompt and asking "would you like to create the parent directory for the directory this edit references?"
Edit Log Toleration is not extensible. It can only ever handle one type of corruption: tail corruption. But we rarely see tail corruption any more, since FSEditLog preallocation was improved in branch-1 (
HDFS-3596). I can't think of a single case of tail corruption we've seen in the past few months. Many of the cases of corruption we've seen have been HDFS-3652, and edit log toleration is inherently useless for this purpose. Missing features can be fixed; inherent uselessness cannot.
And these are just the technical arguments. There's many more convincing process-based arguments. branch-1 is a stable branch. We should be fixing bugs, not making major changes. We should be trying to minimize the divergence between branch-1 and branch-2, not amplify it. People already know how to use recovery mode. We're not going to retrain people to use an (in my opinion more error-prone) system that does the same thing.
Let's just fix the bugs we have (I have pointed out some in this thread), get stuff working, and focus our efforts on the future not the past.