When I read K's paper, I found that it did generally fit the model discussed among ourselves and in HDFS:1623. I really would consider it a specialization of the model. I've re-read both K's and Sanjay+Suresh's papers medium carefully to get a better sense of differences.
1. In K7.8 there is a discussion of whether the NN should more proactively look for missing replicas. I don't remember discussing this, but my first thought is that this is an instance of trying too hard at the margin. During what fraction of the system's lifetime would this help?
2. K7.6 mentions turning off lease recovery, but also replication checking as in SS9.3.1.
3. What is the scope of VIP solutions? This is the "single switch" question. A while ago, we got into trouble when VIP did not just work with HDFS. More recently we got into trouble DNS resolution was cached, but when I asked Rajiv why VIP wasn't the answer, he said that they could not (in general) provide an alternate host with the NN's VIP. Is everybody confident that single-switch VIP works well enough for HA? (When I ask that question of Rajiv, he says yes.)
4. We've been very anxious about the stale deletion request problem where a DN has a request from the old NN that has not been reported to the NN now in service. This is hinted at in SS9.1.2, but I don't think this is fully understood yet. SS goes further into the topic of "data node fencing." Sanjay and I disagree on the merits. I'd argue that DNs should just do as they are told, and not try to mediate sibling disputes among NNs.
5. NN arbitration really is important. K hints somewhere (can't find it now) that the old NN must be stopped. SS are more emphatic. I'd say do STONITH. This becomes more important anywhere near the word "automatic."
6. SS9.2 mentions "leader election". Is the world really symmetric? XXX^1^ denied that symmetry was a good thing. Any specific proposal needs to address the question of how alike the first and second systems are, and whether the process runs backward.
7. Load Replicator in K6.2 is a new contribution to the discussion. This bears on the issue #4, above.
8. Where K really diverges from most discussion here is over the question of Backup name node versus spooling edits on secondary storage. I mostly understand the issues, but in a practical BN deployment, is there a remaining need for some shared storage?
Why, yes, you could do differently, but a practical solution has,
- No Zookeeper
- Only transfer one way without administrator intervention
The open argument in my mind is BN versus spool-to-disk. Oh, and if the LR
really means DNs need not know that there are multiple servers, life is
1 My memory is uncertain about the proper attribution here.