[HADOOP-435] Encapsulating startup scripts and jars in a single Jar file. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: 0.12.1
Fix Version/s: 0.13.0
Component/s: None
Labels:
None

Description

Currently, hadoop is a set of scripts, configurations, and jar files. It makes it a pain to install on compute and datanodes. It also makes it a pain to setup clients so that they can use hadoop. Everytime things are updated the pain begins again.

I suggest that we should be able to build a single Jar file that has a Main-Class defined with the configuration built in so that we can distribute that one file to nodes and clients on updates. One nice thing that I haven't done would be to make the jarfile downloadable from the JobTracker webpage so that clients can easily submit the jobs.

I currently use such a setup on my small cluster. To start the job tracker I used "java -jar hadoop.jar -l /tmp/log jobtracker" to submit a job I use "java -jar hadoop.jar jar wordcount.jar". I used the client on my linux and Mac OSX machines and I'll I need installed in java and the hadoop.jar file.

hadoop.jar helps with logfiles and configurations. The default of pulling the config files from the jar file can be overridden by specifying a config directory so that you can easily have machine specific configs and still have the same hadoop.jar on all machines.

Here are the available commands from hadoop.jar:
USAGE: hadoop [-l logdir] command
User commands:
dfs run a DFS admin client
jar run a JAR file
job manipulate MapReduce jobs
fsck run a DFS filesystem check utility
Runtime startup commands:
datanode run a DFS datanode
jobtracker run the MapReduce job Tracker node
namenode run the DFS namenode (namenode -format formats the FS)
tasktracker run a MapReduce task Tracker node
HadoopLoader commands:
buildJar builds the HadoopLoader jar file
conf dump hadoop configuration

Note, I don't have the classes for hadoop streaming built into this Jar file, but if I had that would also be an option (it checks for needed classes before displaying an option). It makes it very easy for users that just write scripts to use hadoop straight from their machines.

I'm also attaching the start.sh and stop.sh scripts that I use. These are the only scripts I use to startup the daemons. They are very simple and the start.sh script uses the config file to figure out whether or not to start the jobtracker and the nameserver.

The attached patch adds the HadoopIt patch, modifies the Configuration class to find the config files correctly, and modifies the build to make a fully contained hadoop.jar. To update the configuration in a hadoop.jar you simply use "zip hadoop.jar hadoop-site.xml".

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--hadoopit.patch
09/Aug/06 14:42
24 kB
Benjamin Reed
hadoop-exe.patch
24/Apr/07 19:56
19 kB
Thomas White
hadoop-exe.patch
22/Apr/07 20:42
19 kB
Thomas White
hadoop-exe.patch
17/Apr/07 19:09
18 kB
Benjamin Reed
hadoop-exe.patch
21/Mar/07 21:32
18 kB
Benjamin Reed
hadoopit.patch
23/Feb/07 22:13
15 kB
Benjamin Reed
hadoopit.patch
31/Jan/07 17:59
17 kB
Benjamin Reed
start.sh
11/Aug/06 20:19
0.7 kB
Benjamin Reed
stop.sh
11/Aug/06 20:20
0.2 kB
Benjamin Reed

Activity

People

Assignee:: Unassigned

Reporter:: Benjamin Reed

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Aug/06 14:42

Updated:: 29/Jul/08 00:37

Resolved:: 02/May/07 21:24