Hadoop Common
  1. Hadoop Common
  2. HADOOP-9206

"Setting up a Single Node Cluster" instructions need improvement in 0.23.5/2.0.2-alpha branches

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.2-alpha, 0.23.5
    • Fix Version/s: None
    • Component/s: documentation
    • Labels:
      None

      Description

      Hi, in contrast to the easy-to-follow 1.0.4 instructions (http://hadoop.apache.org/docs/r1.0.4/single_node_setup.html) the 0.23.5 and 2.0.2-alpha instructions (http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html) need more clarification – it seems to be written for people who already know and understand hadoop. In particular, these points need clarification:

      1.) Text: "You should be able to obtain the MapReduce tarball from the release."

      Question: What is the MapReduce tarball? What is its name? I don't see such an object within the hadoop-0.23.5.tar.gz download.

      2.) Quote: "NOTE: You will need protoc installed of version 2.4.1 or greater."

      Protoc doesn't have a website you can link to (it's just mentioned offhand when you Google it) – is it really the case today that Hadoop has a dependency on such a minor project? At any rate, if you can have a link of where one goes to get/install Protoc that would be good.

      3.) Quote: "Assuming you have installed hadoop-common/hadoop-hdfs and exported $HADOOP_COMMON_HOME/$HADOOP_HDFS_HOME, untar hadoop mapreduce tarball and set environment variable $HADOOP_MAPRED_HOME to the untarred directory."

      I'm not sure what you mean by the forward slashes: hadoop-common/hadoop-hdfs and $HADOOP_COMMON_HOME/$HADOOP_HDFS_HOME – do you mean & (install both) or or just install one of the two? This needs clarification--please remove the forward slash and replace it with what you're trying to say. The audience here is complete newbie and they've been brought to this page from here: http://hadoop.apache.org/docs/r0.23.5/ (same with r2.0.2-alpha/) (quote: "Getting Started - The Hadoop documentation includes the information you need to get started using Hadoop. Begin with the Single Node Setup which shows you how to set up a single-node Hadoop installation."), they've downloaded hadoop-0.23.5.tar.gz and want to know what to do next. Why are there potentially two applications – hadoop-common and hadoop-hdfs and not just one? (The download doesn't appear to have two separate apps) – if there is indeed just one app can we remove the other from the above text to avoid confusion?

      Again, I just downloaded hadoop-0.23.5.tar.gz – do I need to download more? If so, let us know in the docs here.

      Also, the fragment: "Assuming you have installed hadoop-common/hadoop-hdfs..." No, I haven't, that's what this page is supposed to explain to me how to do – how do I install these two (or just one of these two)?

      Also, what do I set $HADOOP_COMMON_HOME and/or $HADOOP_HDFS_HOME to?

      4.) Quote: "NOTE: The following instructions assume you have hdfs running." No, I don't--how do I do this? Again, this page is supposed to teach me that.

      5.) Quote: "To start the ResourceManager and NodeManager, you will have to update the configs. Assuming your $HADOOP_CONF_DIR is the configuration directory..."

      Could you clarify here what the "configuration directory" is, it doesn't exist in the 0.23.5 download. I just see bin,etc,include,lib,libexec,sbin,share folders but no "conf" one.)

      6.) Quote: "Assuming that the environment variables $HADOOP_COMMON_HOME, $HADOOP_HDFS_HOME, $HADOO_MAPRED_HOME, $YARN_HOME, $JAVA_HOME and $HADOOP_CONF_DIR have been set appropriately."

      We'll need to know what to set YARN_HOME to here.

      Thanks!
      Glen

        Issue Links

          Activity

          Hide
          Andy Isaacson added a comment -

          Note that the docs are being converted from XDOC to APT; see HADOOP-8427 and HADOOP-9190. So please convert single_node_setup.xml to APT before editing the content, if at all possible.

          Show
          Andy Isaacson added a comment - Note that the docs are being converted from XDOC to APT; see HADOOP-8427 and HADOOP-9190 . So please convert single_node_setup.xml to APT before editing the content, if at all possible.
          Hide
          Andy Isaacson added a comment -

          I've converted the xdoc to SingleNodeSetup.apt.vm in HADOOP-9221.

          Show
          Andy Isaacson added a comment - I've converted the xdoc to SingleNodeSetup.apt.vm in HADOOP-9221 .
          Hide
          John Conwell added a comment -

          It doesn't look like this jira ticket is blocked anymore. But the single node cluster setup instructions have not been updated yet. This page is where the beginner goes to setup and configure a pseudo cluster, so in step two (Setting up the environment) you should not assume that anything has been configured yet.

          The main questions are:

          • What should HADOOP_COMMON_HOME be set to
          • What should HADOOP_HDFS_HOME be set to
          • If this is a single tar install, why does it say "Assuming you have installed hadoop-common/hadoop-hdfs" then say "untar hadoop mapreduce tarball". Does that mean there are two tarballs, one for common/hdfs, and one for mapreduce/yarn?
          • Again, if this is a single tar install, why does it say "The following instructions assume you have hdfs running". This makes it sound like I have an hdfs tarball I have to install, configure, then start before I can install mapreduce/yarn
          • The first mention of the HADOOP_CONF_DIR states that it should already be set. Set to what? I thought this was the "how to install a single node cluster page". Is there some other page I need to go to first to install and config something else? Where is it?

          Lots of confusion going on in this page.

          Show
          John Conwell added a comment - It doesn't look like this jira ticket is blocked anymore. But the single node cluster setup instructions have not been updated yet. This page is where the beginner goes to setup and configure a pseudo cluster, so in step two (Setting up the environment) you should not assume that anything has been configured yet. The main questions are: What should HADOOP_COMMON_HOME be set to What should HADOOP_HDFS_HOME be set to If this is a single tar install, why does it say "Assuming you have installed hadoop-common/hadoop-hdfs" then say "untar hadoop mapreduce tarball". Does that mean there are two tarballs, one for common/hdfs, and one for mapreduce/yarn? Again, if this is a single tar install, why does it say "The following instructions assume you have hdfs running". This makes it sound like I have an hdfs tarball I have to install, configure, then start before I can install mapreduce/yarn The first mention of the HADOOP_CONF_DIR states that it should already be set. Set to what? I thought this was the "how to install a single node cluster page". Is there some other page I need to go to first to install and config something else? Where is it? Lots of confusion going on in this page.

            People

            • Assignee:
              Unassigned
              Reporter:
              Glen Mazza
            • Votes:
              5 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Development