[SPARK-15352] Topology aware block replication - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.2.0
Component/s: Block Manager, Mesos, Spark Core, YARN
Labels:
None

Target Version/s:

2.2.0

Description

With cached RDDs, Spark can be used for online analytics where it is used to respond to online queries. But loss of RDD partitions due to node/executor failures can cause huge delays in such use cases as the data would have to be regenerated.
Cached RDDs, even when using multiple replicas per block, are not currently resilient to node failures when multiple executors are started on the same node. Block replication currently chooses a peer at random, and this peer could also exist on the same host.
This effort would add topology aware replication to Spark that can be enabled with pluggable strategies. For ease of development/review, this is being broken down to three major work-efforts:
1. Making peer selection for replication pluggable
2. Providing pluggable implementations for providing topology and topology aware replication
3. Pro-active replenishment of lost blocks

Attachments

Issue Links

links to

[Github] Pull Request #17519 (lins05)

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Shubham Chopra

Reporter:: Shubham Chopra

Votes:: 1 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 16/May/16 22:16

Updated:: 02/Jun/17 17:12

Resolved:: 02/Jun/17 17:10