[SPARK-6614] OutputCommitCoordinator should clear authorized committers only after authorized committer fails, not after any failure - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3.0, 1.3.1, 1.4.0
Fix Version/s: 1.3.1, 1.4.0
Component/s: Scheduler, Spark Core
Labels:
None

Description

In OutputCommitCoordinator, there is some logic to clear the authorized committer's lock on committing in case it fails. However, it looks like the current code also clears this lock if other tasks fail, which is an obvious bug: https://github.com/apache/spark/blob/df3550084c9975f999ed370dd9f7c495181a68ba/core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala#L118. In theory, it's possible that this could allow a new committer to start, run to completion, and commit output before the authorized committer finished, but it's unlikely that this race occurs often in practice due to the complex combination of failure and timing conditions that would be required to expose it. Still, we should fix this issue.

This was discovered by adav while reading the OutputCommitCoordinator code.

Attachments

Issue Links

links to

[Github] Pull Request #5276 (JoshRosen)

Activity

People

Assignee:: Josh Rosen

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Mar/15 22:05

Updated:: 17/May/20 17:47

Resolved:: 31/Mar/15 23:22