Affects Version/s: 2.0.2-alpha, 0.23.5
Fix Version/s: None
Hi, in contrast to the easy-to-follow 1.0.4 instructions (http://hadoop.apache.org/docs/r1.0.4/single_node_setup.html) the 0.23.5 and 2.0.2-alpha instructions (http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html) need more clarification – it seems to be written for people who already know and understand hadoop. In particular, these points need clarification:
1.) Text: "You should be able to obtain the MapReduce tarball from the release."
Question: What is the MapReduce tarball? What is its name? I don't see such an object within the hadoop-0.23.5.tar.gz download.
2.) Quote: "NOTE: You will need protoc installed of version 2.4.1 or greater."
Protoc doesn't have a website you can link to (it's just mentioned offhand when you Google it) – is it really the case today that Hadoop has a dependency on such a minor project? At any rate, if you can have a link of where one goes to get/install Protoc that would be good.
3.) Quote: "Assuming you have installed hadoop-common/hadoop-hdfs and exported $HADOOP_COMMON_HOME/$HADOOP_HDFS_HOME, untar hadoop mapreduce tarball and set environment variable $HADOOP_MAPRED_HOME to the untarred directory."
I'm not sure what you mean by the forward slashes: hadoop-common/hadoop-hdfs and $HADOOP_COMMON_HOME/$HADOOP_HDFS_HOME – do you mean & (install both) or or just install one of the two? This needs clarification--please remove the forward slash and replace it with what you're trying to say. The audience here is complete newbie and they've been brought to this page from here: http://hadoop.apache.org/docs/r0.23.5/ (same with r2.0.2-alpha/) (quote: "Getting Started - The Hadoop documentation includes the information you need to get started using Hadoop. Begin with the Single Node Setup which shows you how to set up a single-node Hadoop installation."), they've downloaded hadoop-0.23.5.tar.gz and want to know what to do next. Why are there potentially two applications – hadoop-common and hadoop-hdfs and not just one? (The download doesn't appear to have two separate apps) – if there is indeed just one app can we remove the other from the above text to avoid confusion?
Again, I just downloaded hadoop-0.23.5.tar.gz – do I need to download more? If so, let us know in the docs here.
Also, the fragment: "Assuming you have installed hadoop-common/hadoop-hdfs..." No, I haven't, that's what this page is supposed to explain to me how to do – how do I install these two (or just one of these two)?
Also, what do I set $HADOOP_COMMON_HOME and/or $HADOOP_HDFS_HOME to?
4.) Quote: "NOTE: The following instructions assume you have hdfs running." No, I don't--how do I do this? Again, this page is supposed to teach me that.
5.) Quote: "To start the ResourceManager and NodeManager, you will have to update the configs. Assuming your $HADOOP_CONF_DIR is the configuration directory..."
Could you clarify here what the "configuration directory" is, it doesn't exist in the 0.23.5 download. I just see bin,etc,include,lib,libexec,sbin,share folders but no "conf" one.)
6.) Quote: "Assuming that the environment variables $HADOOP_COMMON_HOME, $HADOOP_HDFS_HOME, $HADOO_MAPRED_HOME, $YARN_HOME, $JAVA_HOME and $HADOOP_CONF_DIR have been set appropriately."
We'll need to know what to set YARN_HOME to here.