Suresh: we've already had a meeting ostensibly for this purpose, I think. There is also a design document posted to
HDFS-2185. The document doesn't include every possible scenario, because I don't have infinite foresight. I don't think having meetings or more reviews of the design doc will help that.
For example, with the original manual-failover project, we had several design meetings as well as a design document posted on
HDFS-1623. Looking back at that project, the design document captured the overall idea (like the HDFS-2185 one does here) but did not foresee some of the trickiest issues we dealt with during implementation (for example, how to deal with invalidations with regard to datanode fencing, how to handle safe mode, how to deal with delegation tokens, etc).
In that project, as we came upon each new scenario to deal with, we opened a JIRA and had a discussion on the design solution for that particular scenario. I don't see why we can't do the same here. Nor do I see why we are likely to be able to foresee all the corner cases a priori here better than we were able to with
So, I am not going to pause work to wait for meetings or more design discussion. If you see problems with the design, please comment on the design doc on
HDFS-2185, or on the individual JIRAs which seem to have problems. I'm happy to address them, even after commit (eg I'm currently addressing Bikas's review comments on HADOOP-8212)
Since there seems to be concern that we are moving too fast, I will create an auto-failover branch later tonight to continue working on implementing this design. I'll also create a new auto-failover component on JIRA so it's easier to follow them. If you have concerns about the implementation or the design when it comes time to merge it, please do vote against the merge, voicing whatever objections you might have. And please do comment along the way if you see issues.