Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.23.0
-
None
-
Reviewed
Description
Introduction
For the rest of this proposal it is assumed that the current set
of Hadoop subcomponents is:
- hadoop-common
- hadoop-hdfs
- hadoop-yarn
- hadoop-mapreduce
It must be noted that this is an open ended list, though. For example,
implementations of additional frameworks on top of yarn (e.g. MPI) would
also be considered a subcomponent.
Problem statement
Currently there's an unfortunate coupling and hard-coding present at the
level of launcher scripts, configuration scripts and Java implementation
code that prevents us from treating all subcomponents of Hadoop independently
of each other. In a lot of places it is assumed that bits and pieces
from individual subcomponents must be located at predefined places
and they can not be dynamically registered/discovered during the runtime.
This prevents a truly flexible deployment of Hadoop 0.23.
Proposal
NOTE: this is NOT a proposal for redefining the layout from HADOOP-6255.
The goal here is to keep as much of that layout in place as possible,
while permitting different deployment layouts.
The aim of this proposal is to introduce the needed level of indirection and
flexibility in order to accommodate the current assumed layout of Hadoop tarball
deployments and all the other styles of deployments as well. To this end the
following set of environment variables needs to be uniformly used in all of
the subcomponent's launcher scripts, configuration scripts and Java code
(<SC> stands for a literal name of a subcomponent). These variables are
expected to be defined by <SC>-env.sh scripts and sourcing those files is
expected to have the desired effect of setting the environment up correctly.
- HADOOP_<SC>_HOME
- root of the subtree in a filesystem where a subcomponent is expected to be installed
- default value: $0/..
- HADOOP_<SC>_JARS
- a subdirectory with all of the jar files comprising subcomponent's implementation
- default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)
- HADOOP_<SC>_EXT_JARS
- a subdirectory with all of the jar files needed for extended functionality of the subcomponent (nonessential for correct work of the basic functionality)
- default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/ext
- HADOOP_<SC>_NATIVE_LIBS
- a subdirectory with all the native libraries that component requires
- default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/native
- HADOOP_<SC>_BIN
- a subdirectory with all of the launcher scripts specific to the client side of the component
- default value: $(HADOOP_<SC>_HOME)/bin
- HADOOP_<SC>_SBIN
- a subdirectory with all of the launcher scripts specific to the server/system side of the component
- default value: $(HADOOP_<SC>_HOME)/sbin
- HADOOP_<SC>_LIBEXEC
- a subdirectory with all of the launcher scripts that are internal to the implementation and should not be invoked directly
- default value: $(HADOOP_<SC>_HOME)/libexec
- HADOOP_<SC>_CONF
- a subdirectory containing configuration files for a subcomponent
- default value: $(HADOOP_<SC>_HOME)/conf
- HADOOP_<SC>_DATA
- a subtree in the local filesystem for storing component's persistent state
- default value: $(HADOOP_<SC>_HOME)/data
- HADOOP_<SC>_LOG
- a subdirectory for subcomponents's log files to be stored
- default value: $(HADOOP_<SC>_HOME)/log
- HADOOP_<SC>_RUN
- a subdirectory with runtime system specific information
- default value: $(HADOOP_<SC>_HOME)/run
- HADOOP_<SC>_TMP
- a subdirectory with temprorary files
- default value: $(HADOOP_<SC>_HOME)/tmp
Attachments
Attachments
Issue Links
- is blocked by
-
SPARK-1698 Improve spark integration
-
- Closed
-
- is duplicated by
-
HADOOP-9878 getting rid of all the 'bin/../' from all the paths
-
- Resolved
-
- is related to
-
BIGTOP-316 split up hadoop packages into common, hdfs, mapreduce (and yarn)
-
- Closed
-
- requires
-
HDFS-2761 Improve Hadoop subcomponent integration in Hadoop 0.23
-
- Resolved
-
-
MAPREDUCE-3635 Improve Hadoop subcomponent integration in Hadoop 0.23
-
- Resolved
-