Issue Details (XML | Word | Printable)

Key: HADOOP-5670
Type: New Feature New Feature
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Allen Wittenauer
Votes: 0
Watchers: 25
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Hadoop configurations should be read from a distributed system

Created: 14/Apr/09 07:43 PM   Updated: 29/Oct/09 06:53 PM
Return to search
Component/s: conf
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified


 Description  « Hide
Rather than distributing the hadoop configuration files to every data node, compute node, etc, Hadoop should be able to read configuration information (dynamically!) from LDAP, ZooKeeper, whatever.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Jakob Homan added a comment - 14/Apr/09 08:20 PM
Odd. I replied as a comment but jira didn't seem to pick it up...

Agreed. It would still be necessary to support local configuration for single-node clusters and testing though. The configuration should be able to take its values either from a distributed system, or from local values. ZooKeeper would be a reasonable system for retrieving/setting the key-value pairs of the configuration file.


dhruba borthakur added a comment - 14/Apr/09 08:37 PM
One aproach would be to wrap LDAP, Zookeeper, etc via an implementation of org.apache.hadoop.FileSystem.

Allen Wittenauer added a comment - 14/Apr/09 09:09 PM
I think it goes without saying that a pluggable system would be great here. LDAP and ZK are just two examples of many that could be used to provide this sort of functionality. Heck, no reason it couldn't be as simple as a HTTP GET.

[I'll admit i'm slightly partial to LDAP here due to the requirement to provide a schema. It provides a rudimentary way to enforce some basic rules on the data prior to Hadoop even seeing it.]


Konstantin Shvachko added a comment - 15/Apr/09 12:09 AM
Do you think LDAP will scale if 2000 data-nodes will start reading their conf at once?

Allen Wittenauer added a comment - 15/Apr/09 03:35 PM
Like Hadoop, ZK, webservices, etc, LDAP's scale is implementation dependent.

Philip Zeyliger added a comment - 15/Apr/09 07:24 PM
I'd love to be able to specify a configuration URL, instead of distributing hadoop-site.xml all over. Presumably you would specify the URL as "http://host/getConfig" and Hadoop would add some parameters, like "http://host/getConfig?host=xyz&name=foo".

Jakob Homan added a comment - 15/Apr/09 07:28 PM
Wrapping filesystem around ldap/zk/etc seems a bit heavy. Another option would be to extend Configuration and create ZooKeeperConfiguration, LDAPConfiguration, HTTPConfiguration, etc. Each node would still require a small file to tell it where/which resource to locate. A builder could read this file and construct the appropriate configuration and hand it off to the node. This would be very pluggable and transparent to the nodes.

Konstantin Shvachko added a comment - 15/Apr/09 09:06 PM
Centralized configuration facility adds one more single point of failure. If LDAP server fails - everything fails.
I am not arguing against centralized configurators in general. Just pointing out possible issues.

Configuration may be just the first step. Then we may want to build centralize hadoop distribution repositories from which cluster nodes can pick required jars, etc.


Konstantin Shvachko added a comment - 15/Apr/09 09:10 PM
> Each node would still require a small file to tell it where/which resource to locate.

Yes and that makes me think that something is not right here: in order obtain data-node configuration I need to specify the location of the configuration server and still distribute this configuration file to 2000 nodes.


Allen Wittenauer added a comment - 15/Apr/09 09:35 PM
The amount of churn in a configuration file that only points to where to read the real configuration data is significantly lower than changes in Hadoop's configuration in my experience. For example, our system level ldap.conf has changed maybe three times in 1 year, and most of those were growing pains of a new infrastructure/data center. I have lost track of how many times we have needed to change and bounce the jobtracker ACLs in the past week.

Additionally, HTTP, LDAP, etc, have real, proven HA solutions. They are much less likely to go down than the Hadoop grids they would be supporting....


Todd Lipcon added a comment - 15/Apr/09 09:41 PM
Another nice thing about this is that you can use DNS for a super-simple HA/metaconfig system. Simply point all the hadoop nodes at http://confmaster/config, then set up CNAME or A records on the local domain network to the right machines. If one confmaster is down, it'll just pull from another.

Mahadev konar added a comment - 16/Apr/09 03:17 AM
making a case for zookeeper , to write a configuration layer on top of zookeeper would be just 20-30 lines of code and we can handle around >20K writes per second (with 3 servers, which I dont think would be necessary but other apps also could use zookeeper) and >50K reads/sec. Also zookeeper has a pretty strong HA story.

dhruba borthakur added a comment - 16/Apr/09 04:37 AM
Also, this could sow the seeds of a true federation of hdfs clusters!

Amr Awadallah added a comment - 16/Apr/09 05:26 AM
I like the zookeeper option, especially the HA part.

– amr


Andrey Kuzmin added a comment - 16/Apr/09 12:39 PM
Couple of suggestions (assuming zookeeper option) based on past experience with similar issue.

1. Splitting configuration into two parts - default and site-specific overrides (with further specialization down the line) - would simplify hadoop upgrade by minimizing upgrade impact on locally overridden options.

2. If zookeeper does not provide for node-local persistent caching, adding this option specifically for configuration data could support disconnected node operations if need be (say, for sanity checks on start-up).


Allen Wittenauer added a comment - 16/Apr/09 06:07 PM
#1 could be done no matter what the source. It just depends upon how smart the plug in framework and the actual plug in is. For example, assuming an HTTP plug in: it would just fetch two files and do the merge just like Hadoop configures things today.

#2 should be done no matter what the source. The question is whether it should be handled inside or outside the plug in.

The follow up to this bug is really: how do you build a registry of configs and have the client smart enough to know which entry it needs to follow. So that might need to be part of the design here. [For example, if I have a client machine that needs to submit jobs to two different grids, how can it automagically pull the proper configuration information for those two grids? ]


Andrey Kuzmin added a comment - 16/Apr/09 08:16 PM
> For example, assuming an HTTP plug in: it would just fetch two files and do the merge
> just like Hadoop configures things today.
Yes, this is exactly what a simplistic solution would do. I'd rather not limit myself to two files, though: why not get ready for federated grids in advance. Further, one has to take some care with option semantics: for instance, override of some default options may turn undesirable or even should be prohibited. There some nits here, to summarize.

> [For example, if I have a client machine that needs to submit jobs to two different grids,
> how can it automagically pull the proper configuration information for those two grids? ]
I didn't actually consider clients. If ZK supports client connections (meaning outside-world readers and may be even writers), not sure with this - just started reading docs/code last week, - this should be fairly straightforward. The only thing one then needs at the client to answer your concern would be a simple "AvailableGrids" config listing respective ZK URLs to connect to.


Philip Zeyliger added a comment - 17/Apr/09 06:05 AM
HADOOP-4944 contains a patch to use "xinclude" in configuration files. That seems functionally similar to the goals of this ticket.

Thoughts?


Tom White added a comment - 17/Apr/09 08:05 AM
XInclude from a remote filesystem? I'm worried that XML parsers won't
handle errors well. Perhaps it could be made to work.

There are two pieces to this ticket anyway. 1. read config files from
an arbitrary filesystem. 2. get daemons to re-read config more
dynamically - most of them just do so once at startup. I would have
separate issues for these.

I had some conversations with Doug on this in the summer, but I never
got round to even opening a ticket . The idea was to expose ZK as a
Hadoop FileSystem then change Configuration to load from an arbitrary
FS. There is a bootstraping issue to get round too.

Tom


Allen Wittenauer added a comment - 17/Apr/09 02:50 PM
Given ZK's status as a subproject, where does ZK read its configuration data from? Do we have a chicken and egg problem?

Tom White added a comment - 17/Apr/09 03:28 PM
ZK uses Java properties files, so it has no dependencies on Hadoop Core configuration.

The bootstrapping problem is that FileSystem relies on a Configuration object having loaded its resources. So to get a Configuration to load from a FileSystem won't work, the way things stand (see Configuration#loadResource). (It's possible that the reason for this code has gone away, or at least lessened, following the recent split of configurations into core, hdfs and mapred - it might be possible to bootstrap from the core config.) There's no inherent reason, though, why a Configuration couldn't load its resources from HDFS or a ZK-backed FileSystem. We just need to avoid a system loading resources from itself.


Mahadev konar added a comment - 17/Apr/09 04:43 PM

There are two pieces to this ticket anyway. 1. read config files from
an arbitrary filesystem. 2. get daemons to re-read config more
dynamically - most of them just do so once at startup. I would have
separate issues for these.

+1


dhruba borthakur added a comment - 17/Apr/09 05:24 PM
> The idea was to expose ZK as a Hadoop FileSystem

Like I mentioned earlier in this JIRA, I really like this idea. I have an app that currently stores data on HDFS, but I would love to make it run on zookeeper without changing my application code.


Steve Loughran added a comment - 23/Apr/09 01:23 PM
  1. Apache Directory implements an NFS filesystem front end to LDAP data, so you could today use it to provide all of the configuration data in a single point of failure/HA directory service
  2. I do subclass Configuration for my work, with a ManagedConfiguration being read from SmartFrog, rather than the filesystem. I have to pass an instance of this down when any service gets created.
  3. These Configurations get serialised and sent around with Job submissions, so they soon become static snapshots of configuration when jobs were queued, not when they were executed. You can't be dynamic without changing this behaviour.
  4. There are 350+ places in the Hadoop codebase where something calls the {{new Configuration()} operation, to create a config purely from the registered files. This is bad; some kind of factory approach would be better, with the factory settable on a per-JVM or per-thread basis.
  5. HADOOP-3582 collects other requirements for configuration.

I am willing to get involved in this, as config is part of management, and I probably have the most dynamic configuration setup to date. But I'd rather wait until after the service lifecycle stuff is in, so change can be handled more gradually. If any configuration changes were to go into 0.21, I'd opt for a factory mechanism for creating new Configuration instances, so that people exploring configuration options can control things better.


Andrey Kuzmin added a comment - 23/Apr/09 01:40 PM
> until after the service lifecycle stuff is in
Could you please post a link to discussion/ticket, I'd like to take a look.

Thanks,
Andrey


Steve Loughran added a comment - 23/Apr/09 01:52 PM
Looking at my subclassed configuration code http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components/hadoop/src/org/smartfrog/services/hadoop/conf/ManagedConfiguration.java?view=markup I am reminded of some things
  1. It subclasses JobConf, as some things expect to be handed a JobConf
  2. In the past, I did have it live, bound to the SmartFrog configuration graph, so (until serialized) every change in the SF configuration was reflected in the result of the get() method.
  3. That was pretty complex, I had to subclass a lot of methods that assumed that all state was stored statically in two properties files.
  4. It proved easier to comment the overrides out and move to grabbing all the values once, and no longer being live.

Some people have mentioned that they'd like daemons to re-read their config more dynamically. That can get very confusing, as you may have some config values (ports, network cards, temp directories) whose changes only get picked up by a full restart. There are two tactics we've used in the past for this

  1. Have your service support a "reload" operation which loads in all values and resets thing
  2. Just restart the service
    Provided the cost of restarting is low, restarting services is way, way easier.

Steve Loughran added a comment - 23/Apr/09 02:27 PM
@Andrey, I was referring to HADOOP-3628. the changes are orthogonal, its just that I want to get the service stuff done before I spend any of my time worrying about what we can do to make configuration easier. With the subclassing working, I am happy short term, though I think it could be done more cleanly.

Arun C Murthy added a comment - 05/May/09 09:00 PM - edited
One important and related feature are the necessary access-control mechanisms for reloading configurations, should we track it separately? Maybe HADOOP-5772 ?

Edward Capriolo added a comment - 29/Oct/09 01:58 AM
I like the way glusterfs handles this.

http://www.gluster.com/community/documentation/index.php/Client_Installation_and_Configuration#Manually_Mounting_a_Volume

On the glusterfs server it holds all the client configurations in a configuration file. The cluster client can either be started up with a local configuration file or be given an IP/PORT and it will download its configuration from the server.

Datanode should probably pull it its configuration from the namenode.
Tasktracker should pull its configuration from jobtracker

On the practicality side, large numbers of server will likely share a configuration. We should have a concept of a default configuration, and then per hosts overrides, and a way of saying groups of servers all share the same configuration file.

I think with reloads a deamon like a DataNode could theoretically check the configuration and then make its own decision on if could apply the change, or if it should restart because of the change.


Allen Wittenauer added a comment - 29/Oct/09 06:53 PM
I wish I had pictures of the whiteboard talk Owen and I had around this feature.

One the big things I want people to keep in mind is that a client of the grid needs a lot of this information too. [e.g., something like default block size]

There is also the issue of multiple JobTrackers per file system.

You can do lovely things with a hierarchy of configuration information.