[PIG-2672] Optimize the use of DistributedCache - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: None
Labels:
None

Description

Pig currently copies jar files to a temporary location in hdfs and then adds them to DistributedCache for each job launched. This is inefficient in terms of

Space - The jars are distributed to task trackers for every job taking up lot of local temporary space in tasktrackers.
Performance - The jar distribution impacts the job launch time.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-2672.patch
20/Sep/13 22:12
23 kB
Aniket Namadeo Mokashi
PIG-2672-10.patch
03/Feb/14 21:43
12 kB
Aniket Namadeo Mokashi
PIG-2672-5.patch
21/Jan/14 03:03
43 kB
Aniket Namadeo Mokashi
PIG-2672-7.patch
31/Jan/14 01:24
11 kB
Aniket Namadeo Mokashi

Issue Links

is related to

PIG-4407 Allow specifying a replication factor for jarcache

Closed

HIVE-860 Persistent distributed cache

Patch Available

relates to

PIG-5290 User Cache upload contention can cause job failures

Resolved

YARN-1492 truly shared cache for jars (jobjar/libjar)

Resolved

Activity

People

Assignee:: Aniket Namadeo Mokashi

Reporter:: Rohini Palaniswamy

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 27/Apr/12 18:04

Updated:: 12/Sep/17 22:23

Resolved:: 03/Feb/14 21:59