Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-7939

Improve Hadoop subcomponent integration in Hadoop 0.23

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.1
    • Component/s: build, conf, documentation, scripts
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Introduction

      For the rest of this proposal it is assumed that the current set
      of Hadoop subcomponents is:

      • hadoop-common
      • hadoop-hdfs
      • hadoop-yarn
      • hadoop-mapreduce

      It must be noted that this is an open ended list, though. For example,
      implementations of additional frameworks on top of yarn (e.g. MPI) would
      also be considered a subcomponent.

      Problem statement

      Currently there's an unfortunate coupling and hard-coding present at the
      level of launcher scripts, configuration scripts and Java implementation
      code that prevents us from treating all subcomponents of Hadoop independently
      of each other. In a lot of places it is assumed that bits and pieces
      from individual subcomponents must be located at predefined places
      and they can not be dynamically registered/discovered during the runtime.
      This prevents a truly flexible deployment of Hadoop 0.23.

      Proposal

      NOTE: this is NOT a proposal for redefining the layout from HADOOP-6255.
      The goal here is to keep as much of that layout in place as possible,
      while permitting different deployment layouts.

      The aim of this proposal is to introduce the needed level of indirection and
      flexibility in order to accommodate the current assumed layout of Hadoop tarball
      deployments and all the other styles of deployments as well. To this end the
      following set of environment variables needs to be uniformly used in all of
      the subcomponent's launcher scripts, configuration scripts and Java code
      (<SC> stands for a literal name of a subcomponent). These variables are
      expected to be defined by <SC>-env.sh scripts and sourcing those files is
      expected to have the desired effect of setting the environment up correctly.

      1. HADOOP_<SC>_HOME
        1. root of the subtree in a filesystem where a subcomponent is expected to be installed
        2. default value: $0/..
      2. HADOOP_<SC>_JARS
        1. a subdirectory with all of the jar files comprising subcomponent's implementation
        2. default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)
      3. HADOOP_<SC>_EXT_JARS
        1. a subdirectory with all of the jar files needed for extended functionality of the subcomponent (nonessential for correct work of the basic functionality)
        2. default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/ext
      4. HADOOP_<SC>_NATIVE_LIBS
        1. a subdirectory with all the native libraries that component requires
        2. default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/native
      5. HADOOP_<SC>_BIN
        1. a subdirectory with all of the launcher scripts specific to the client side of the component
        2. default value: $(HADOOP_<SC>_HOME)/bin
      6. HADOOP_<SC>_SBIN
        1. a subdirectory with all of the launcher scripts specific to the server/system side of the component
        2. default value: $(HADOOP_<SC>_HOME)/sbin
      7. HADOOP_<SC>_LIBEXEC
        1. a subdirectory with all of the launcher scripts that are internal to the implementation and should not be invoked directly
        2. default value: $(HADOOP_<SC>_HOME)/libexec
      8. HADOOP_<SC>_CONF
        1. a subdirectory containing configuration files for a subcomponent
        2. default value: $(HADOOP_<SC>_HOME)/conf
      9. HADOOP_<SC>_DATA
        1. a subtree in the local filesystem for storing component's persistent state
        2. default value: $(HADOOP_<SC>_HOME)/data
      10. HADOOP_<SC>_LOG
        1. a subdirectory for subcomponents's log files to be stored
        2. default value: $(HADOOP_<SC>_HOME)/log
      11. HADOOP_<SC>_RUN
        1. a subdirectory with runtime system specific information
        2. default value: $(HADOOP_<SC>_HOME)/run
      12. HADOOP_<SC>_TMP
        1. a subdirectory with temprorary files
        2. default value: $(HADOOP_<SC>_HOME)/tmp

        Attachments

        1. hadoop-layout.sh
          0.9 kB
          Roman Shaposhnik
        2. HADOOP-7939-simplified-3.patch.txt
          9 kB
          Roman Shaposhnik
        3. HADOOP-7939-simplified-2.patch.txt
          9 kB
          Roman Shaposhnik
        4. HADOOP-7939-simplified.patch.txt
          9 kB
          Roman Shaposhnik
        5. HADOOP-7939.patch.txt
          30 kB
          Roman Shaposhnik

          Issue Links

            Activity

              People

              • Assignee:
                rvs Roman Shaposhnik
                Reporter:
                rvs Roman Shaposhnik
              • Votes:
                2 Vote for this issue
                Watchers:
                25 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: