|
One aproach would be to wrap LDAP, Zookeeper, etc via an implementation of org.apache.hadoop.FileSystem.
I think it goes without saying that a pluggable system would be great here. LDAP and ZK are just two examples of many that could be used to provide this sort of functionality. Heck, no reason it couldn't be as simple as a HTTP GET.
[I'll admit i'm slightly partial to LDAP here due to the requirement to provide a schema. It provides a rudimentary way to enforce some basic rules on the data prior to Hadoop even seeing it.] Do you think LDAP will scale if 2000 data-nodes will start reading their conf at once?
Like Hadoop, ZK, webservices, etc, LDAP's scale is implementation dependent.
I'd love to be able to specify a configuration URL, instead of distributing hadoop-site.xml all over. Presumably you would specify the URL as "http://host/getConfig" and Hadoop would add some parameters, like "http://host/getConfig?host=xyz&name=foo".
Wrapping filesystem around ldap/zk/etc seems a bit heavy. Another option would be to extend Configuration and create ZooKeeperConfiguration, LDAPConfiguration, HTTPConfiguration, etc. Each node would still require a small file to tell it where/which resource to locate. A builder could read this file and construct the appropriate configuration and hand it off to the node. This would be very pluggable and transparent to the nodes.
Centralized configuration facility adds one more single point of failure. If LDAP server fails - everything fails.
I am not arguing against centralized configurators in general. Just pointing out possible issues. Configuration may be just the first step. Then we may want to build centralize hadoop distribution repositories from which cluster nodes can pick required jars, etc. > Each node would still require a small file to tell it where/which resource to locate.
Yes and that makes me think that something is not right here: in order obtain data-node configuration I need to specify the location of the configuration server and still distribute this configuration file to 2000 nodes. The amount of churn in a configuration file that only points to where to read the real configuration data is significantly lower than changes in Hadoop's configuration in my experience. For example, our system level ldap.conf has changed maybe three times in 1 year, and most of those were growing pains of a new infrastructure/data center. I have lost track of how many times we have needed to change and bounce the jobtracker ACLs in the past week.
Additionally, HTTP, LDAP, etc, have real, proven HA solutions. They are much less likely to go down than the Hadoop grids they would be supporting.... Another nice thing about this is that you can use DNS for a super-simple HA/metaconfig system. Simply point all the hadoop nodes at http://confmaster/config
making a case for zookeeper
Also, this could sow the seeds of a true federation of hdfs clusters!
I like the zookeeper option, especially the HA part.
– amr Couple of suggestions (assuming zookeeper option) based on past experience with similar issue.
1. Splitting configuration into two parts - default and site-specific overrides (with further specialization down the line) - would simplify hadoop upgrade by minimizing upgrade impact on locally overridden options. 2. If zookeeper does not provide for node-local persistent caching, adding this option specifically for configuration data could support disconnected node operations if need be (say, for sanity checks on start-up). #1 could be done no matter what the source. It just depends upon how smart the plug in framework and the actual plug in is. For example, assuming an HTTP plug in: it would just fetch two files and do the merge just like Hadoop configures things today.
#2 should be done no matter what the source. The question is whether it should be handled inside or outside the plug in. The follow up to this bug is really: how do you build a registry of configs and have the client smart enough to know which entry it needs to follow. So that might need to be part of the design here. > For example, assuming an HTTP plug in: it would just fetch two files and do the merge
> just like Hadoop configures things today. Yes, this is exactly what a simplistic solution would do. I'd rather not limit myself to two files, though: why not get ready for federated grids in advance. Further, one has to take some care with option semantics: for instance, override of some default options may turn undesirable or even should be prohibited. There some nits here, to summarize. > [For example, if I have a client machine that needs to submit jobs to two different grids, Thoughts? XInclude from a remote filesystem? I'm worried that XML parsers won't
handle errors well. Perhaps it could be made to work. There are two pieces to this ticket anyway. 1. read config files from I had some conversations with Doug on this in the summer, but I never Tom Given ZK's status as a subproject, where does ZK read its configuration data from? Do we have a chicken and egg problem?
ZK uses Java properties files, so it has no dependencies on Hadoop Core configuration.
The bootstrapping problem is that FileSystem relies on a Configuration object having loaded its resources. So to get a Configuration to load from a FileSystem won't work, the way things stand (see Configuration#loadResource). (It's possible that the reason for this code has gone away, or at least lessened, following the recent split of configurations into core, hdfs and mapred - it might be possible to bootstrap from the core config.) There's no inherent reason, though, why a Configuration couldn't load its resources from HDFS or a ZK-backed FileSystem. We just need to avoid a system loading resources from itself.
+1 > The idea was to expose ZK as a Hadoop FileSystem
Like I mentioned earlier in this JIRA, I really like this idea. I have an app that currently stores data on HDFS, but I would love to make it run on zookeeper without changing my application code.
I am willing to get involved in this, as config is part of management, and I probably have the most dynamic configuration setup to date. But I'd rather wait until after the service lifecycle stuff is in, so change can be handled more gradually. If any configuration changes were to go into 0.21, I'd opt for a factory mechanism for creating new Configuration instances, so that people exploring configuration options can control things better. > until after the service lifecycle stuff is in
Could you please post a link to discussion/ticket, I'd like to take a look. Thanks, Looking at my subclassed configuration code http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components/hadoop/src/org/smartfrog/services/hadoop/conf/ManagedConfiguration.java?view=markup
Some people have mentioned that they'd like daemons to re-read their config more dynamically. That can get very confusing, as you may have some config values (ports, network cards, temp directories) whose changes only get picked up by a full restart. There are two tactics we've used in the past for this
@Andrey, I was referring to HADOOP-3628. the changes are orthogonal, its just that I want to get the service stuff done before I spend any of my time worrying about what we can do to make configuration easier. With the subclassing working, I am happy short term, though I think it could be done more cleanly.
One important and related feature are the necessary access-control mechanisms for reloading configurations, should we track it separately? Maybe HADOOP-5772 ?
I like the way glusterfs handles this.
On the glusterfs server it holds all the client configurations in a configuration file. The cluster client can either be started up with a local configuration file or be given an IP/PORT and it will download its configuration from the server. Datanode should probably pull it its configuration from the namenode. On the practicality side, large numbers of server will likely share a configuration. We should have a concept of a default configuration, and then per hosts overrides, and a way of saying groups of servers all share the same configuration file. I think with reloads a deamon like a DataNode could theoretically check the configuration and then make its own decision on if could apply the change, or if it should restart because of the change. I wish I had pictures of the whiteboard talk Owen and I had around this feature.
One the big things I want people to keep in mind is that a client of the grid needs a lot of this information too. [e.g., something like default block size] There is also the issue of multiple JobTrackers per file system. You can do lovely things with a hierarchy of configuration information. |
||||||||||||||||||||||||||||||||||||||||||||||
Agreed. It would still be necessary to support local configuration for single-node clusters and testing though. The configuration should be able to take its values either from a distributed system, or from local values. ZooKeeper would be a reasonable system for retrieving/setting the key-value pairs of the configuration file.