[YARN-1492] truly shared cache for jars (jobjar/libjar) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.4-alpha
Fix Version/s: 2.9.0, 3.0.0
Component/s: None
Labels:
None

Target Version/s:

2.9.0
Release Note:

Hide
The YARN Shared Cache provides the facility to upload and manage shared application resources to HDFS in a safe and scalable manner. YARN applications can leverage resources uploaded by other applications or previous runs of the same application without having to re-upload and localize identical files multiple times. This will save network resources and reduce YARN application startup time.

Show
The YARN Shared Cache provides the facility to upload and manage shared application resources to HDFS in a safe and scalable manner. YARN applications can leverage resources uploaded by other applications or previous runs of the same application without having to re-upload and localize identical files multiple times. This will save network resources and reduce YARN application startup time.

Description

Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of "bringing compute to where data is". This is wasteful because in most cases code doesn't change much across many jobs.

I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

shared_cache_design_v2.pdf
26/Aug/13 22:43
26 kB
Sangjin Lee
shared_cache_design_v3.pdf
17/Sep/13 17:37
26 kB
Sangjin Lee
shared_cache_design_v4.pdf
06/Dec/13 19:01
25 kB
Sangjin Lee
shared_cache_design_v5.pdf
11/Feb/14 23:45
253 kB
Chris Trezzo
shared_cache_design_v6.pdf
10/Sep/14 18:14
251 kB
Chris Trezzo
shared_cache_design.pdf
20/Aug/13 18:08
24 kB
Sangjin Lee
YARN-1492-all-trunk-v1.patch
16/Jun/14 18:01
435 kB
Chris Trezzo
YARN-1492-all-trunk-v2.patch
20/Aug/14 00:09
455 kB
Chris Trezzo
YARN-1492-all-trunk-v3.patch
21/Aug/14 00:36
455 kB
Chris Trezzo
YARN-1492-all-trunk-v4.patch
03/Sep/14 01:34
455 kB
Chris Trezzo
YARN-1492-all-trunk-v5.patch
05/Sep/14 01:28
456 kB
Chris Trezzo

Issue Links

duplicates

MAPREDUCE-1901 Jobs should not submit the same jar files over and over again

Resolved

is depended upon by

MAPREDUCE-5951 Add support for the YARN Shared Cache

Resolved

is related to

YARN-1016 Define a HDFS based repository that allows YARN services to share resources

Resolved

YARN-1020 Resource Localization using Groups as a new Localization Type

Open

YARN-7282 Shared Cache Phase 2

Open

YARN-1529 Add Localization overhead metrics to NM

Resolved

YARN-1756 Capture one more timestamp for an application when ApplicationClientProtocol#getNewApplication is executed

Resolved

PIG-2672 Optimize the use of DistributedCache

Closed

relates to

YARN-6117 SharedCacheManager does not start up

Resolved

REEF-58 Add support for YARN SCM

Open

(3 is related to, 2 relates to)

Sub-Tasks

1.	SCM/Client/NM/Admin protocols	Closed	Chris Trezzo
2.	Initial cache manager structure and context	Closed	Chris Trezzo
3.	In-memory backing store for cache manager	Closed	Chris Trezzo
4.	Cleaner service for cache manager	Closed	Chris Trezzo
5.	Node Manager uploader service for cache manager	Closed	Chris Trezzo
6.	Shared Cache uploader service on the Node Manager	Closed	Chris Trezzo
7.	Client service for cache manager	Closed	Chris Trezzo
8.	Admin service for cache manager	Closed	Chris Trezzo
9.	Web UI for cache manager	Closed	Chris Trezzo
10.	Shared cache client side changes	Closed	Chris Trezzo
11.	Revisit all shared cache config parameters to ensure quality names	Resolved	Chris Trezzo
12.	InMemorySCMStore properties are inconsistent	Closed	Ray Chiang
13.	InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance	Closed	Chris Trezzo
14.	Add documentation for the YARN shared cache	Resolved	Chris Trezzo
15.	Potential race condition in Singleton implementation of SharedCacheUploaderMetrics, CleanerMetrics, ClientSCMMetrics	Closed	Varun Saxena
16.	Handle localization sym-linking correctly at the YARN level	Resolved	Chris Trezzo
17.	Update Shared cache client api to use URLs	Resolved	Chris Trezzo

Activity

People

Assignee:: Chris Trezzo

Reporter:: Sangjin Lee

Votes:: 3 Vote for this issue

Watchers:: 81 Start watching this issue

Dates

Created:: 11/Jun/13 20:00

Updated:: 05/Oct/17 18:45

Resolved:: 05/Oct/17 18:45