Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14468

Always enable OutputCommitCoordinator

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.2, 1.5.2, 1.6.2, 2.0.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      The OutputCommitCoordinator was originally introduced in SPARK-4879 because speculation causes the output of some partitions to be deleted. However, as we can see in SPARK-10063, speculation is not the only case where this can happen.

      More specifically, when we retry a stage we're not guaranteed to kill the tasks that are still running (we don't even interrupt their threads), so we may end up with multiple concurrent task attempts for the same task. This leads to problems like SPARK-8029, but this fix alone is necessary but not sufficient.

      In general, when we run into situations like these, we need the OutputCommitCoordinator because we don't control what the underlying file system does. Enabling this doesn't induce heavy performance costs so there's little reason why we shouldn't always enable it to ensure correctness.

        Attachments

          Activity

            People

            • Assignee:
              andrewor14 Andrew Or
              Reporter:
              andrewor14 Andrew Or
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: