[SPARK-3495] Block replication fails continuously when the replication target node is dead - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.0.2, 1.1.0
Fix Version/s: 1.1.1, 1.2.0
Component/s: Block Manager, DStreams, Spark Core
Labels:
None

Target Version/s:

1.1.1, 1.2.0

Description

If a block manager (say, A) wants to replicate a block and the node chosen for replication (say, B) is dead, then the attempt to send the block to B fails. However, this continues to fail indefinitely. Even if the driver learns about the demise of the B, A continues to try replicating to B and failing miserably.

The reason behind this bug is that A initially fetches a list of peers from the driver (when B was active), but never updates it after B is dead. This affects Spark Streaming as its receiver uses block replication.

Attachments

Issue Links

is duplicated by

SPARK-3498 Block always replicated to the same node

Closed

links to

[Github] Pull Request #2366 (tdas)

[Github] Pull Request #3191 (tdas)

Activity

People

Assignee:: Tathagata Das

Reporter:: Tathagata Das

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Sep/14 01:01

Updated:: 11/Nov/14 02:37

Resolved:: 02/Oct/14 20:50