[HADOOP-1622] Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.17.0
Component/s: None
Labels:
None

Release Note:

Hide
This patch allows new command line options for

hadoop jar
which are

hadoop jar -files <comma seperated list of files> -libjars <comma seperated list of jars> -archives <comma seperated list of archives>

-files options allows you to speficy comma seperated list of path which would be present in your current working directory of your task
-libjars option allows you to add jars to the classpaths of the maps and reduces.
-archives allows you to pass archives as arguments that are unzipped/unjarred and a link with name of the jar/zip are created in the current working directory if tasks.

Show
This patch allows new command line options for hadoop jar which are hadoop jar -files <comma seperated list of files> -libjars <comma seperated list of jars> -archives <comma seperated list of archives> -files options allows you to speficy comma seperated list of path which would be present in your current working directory of your task -libjars option allows you to add jars to the classpaths of the maps and reduces. -archives allows you to pass archives as arguments that are unzipped/unjarred and a link with name of the jar/zip are created in the current working directory if tasks.

Description

More likely than not, a user's job may depend on multiple jars.
Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
(like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
of job submission. Someting like:

bin/hadoop .... --depending_jars j1.jar:j2.jar

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-1622_1.patch
20/Mar/08 23:31
20 kB
Mahadev Konar
HADOOP-1622_2.patch
22/Mar/08 02:21
28 kB
Mahadev Konar
HADOOP-1622_3.patch
24/Mar/08 21:30
29 kB
Mahadev Konar
HADOOP-1622_4.patch
25/Mar/08 00:00
28 kB
Mahadev Konar
HADOOP-1622_5.patch
25/Mar/08 16:55
30 kB
Mahadev Konar
HADOOP-1622_6.patch
26/Mar/08 17:53
30 kB
Mahadev Konar
hadoop-1622-4-20071008.patch
09/Oct/07 19:11
48 kB
Dennis Kubes
HADOOP-1622-5.patch
18/Oct/07 20:40
46 kB
Doug Cutting
HADOOP-1622-6.patch
19/Oct/07 18:58
46 kB
Doug Cutting
HADOOP-1622-7.patch
25/Oct/07 16:31
44 kB
Doug Cutting
HADOOP-1622-8.patch
27/Oct/07 06:06
45 kB
Dennis Kubes
HADOOP-1622-9.patch
27/Oct/07 21:06
46 kB
Dennis Kubes
multipleJobJars.patch
20/Jul/07 04:13
8 kB
Dennis Kubes
multipleJobResources.patch
25/Jul/07 07:30
43 kB
Dennis Kubes
multipleJobResources2.patch
30/Jul/07 21:25
44 kB
Dennis Kubes

Issue Links

is duplicated by

HADOOP-366 Should be able to specify more than one jar into a JobConf file

Closed

is related to

MAPREDUCE-574 Fix -file option in Streaming to use Distributed Cache

Resolved

Activity

People

Assignee:: Mahadev Konar

Reporter:: Runping Qi

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 17/Jul/07 18:51

Updated:: 08/Jul/09 16:52

Resolved:: 26/Mar/08 21:08