We don't support multiple shared edits dirs, we should fail to start with an error in this case.
The other issue is that, in order to support multiple shared edits, we'd need a quorum-like behavior rather than the current "at least one" behavior. Otherwise you could imagine that, with two shared dirs (SD1 and SD2) and two NNs, you might have the case where NN1 is writing to only SD1 and NN2 is reading from only SD2. Let's continue this discussion on HDFS-2782
I agree with Eli - we don't currently use the JournalSet abstraction in EditLogTailer, so it can only use a single shared dir. Of course in the future we should support using multiple, but it adds some complexity to the initial release.
EditLogTailer uses FSEditLog which uses JournalSet. I think it should be able to handle multiple shared edits, unless there is another bug.
Integrated in Hadoop-Hdfs-HAbranch-build #70 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/70/)
HDFS-2752. HA: exit if multiple shared dirs are configured. Contributed by Eli Collins
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1240916
Thanks for the review Todd. Fixed the nit and committed.
Just one nit: you can remove the following empty javadoc annotation:
+ * @param conf
Aside from that, +1.
Patch attached. Running the full test suite for sanity. I left the dupe detection for shared dirs in place since we'll need it when we add multiple shared dir support.
Sorry for the lack of context, this came up in HDFS-2709. With multiple shared edits dirs a failure to read from one of them will prevent the edit log tailer from catching up, ie users are currently less reliable with multiple shared dirs. Until we know we've got it working reasonably well (ie have tests for the common scenarios) it doesn't seem like we should let people shoot themselves in the foot. And while nice to have, it doesn't seem like multiple shared dir support should block an initial release. Agree? Perhaps warn loudly instead of exit?
Eli, I am confused a bit. What is the special required in supporting multiple shared directories that is not handled currently? Why turn it off because tests are not there?
It's not currently, but it should be until we support it (see HDFS-2735, needs tests and fixes).
Why is this an error?