[SPARK-14468] Always enable OutputCommitCoordinator - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.2, 1.5.2, 1.6.2, 2.0.0
Component/s: Spark Core
Labels:
None

Target Version/s:

1.4.2, 1.5.2, 1.6.2, 2.0.0

Description

The OutputCommitCoordinator was originally introduced in ~~SPARK-4879~~ because speculation causes the output of some partitions to be deleted. However, as we can see in ~~SPARK-10063~~, speculation is not the only case where this can happen.

More specifically, when we retry a stage we're not guaranteed to kill the tasks that are still running (we don't even interrupt their threads), so we may end up with multiple concurrent task attempts for the same task. This leads to problems like ~~SPARK-8029~~, but this fix alone is necessary but not sufficient.

In general, when we run into situations like these, we need the OutputCommitCoordinator because we don't control what the underlying file system does. Enabling this doesn't induce heavy performance costs so there's little reason why we shouldn't always enable it to ensure correctness.

Attachments

Issue Links

links to

[Github] Pull Request #12244 (andrewor14)

Activity

People

Assignee:: Andrew Or

Reporter:: Andrew Or

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Apr/16 22:10

Updated:: 08/Apr/16 00:51

Resolved:: 08/Apr/16 00:51