Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24589

OutputCommitCoordinator may allow duplicate commits

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.2.1, 2.3.1
    • Fix Version/s: 2.1.3, 2.2.2, 2.3.2, 2.4.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      This is a sibling bug to SPARK-24552. While investigating the source of that bug, it was found that currently the output committer allows duplicate commits when there are stage retries, and the task with the task attempt number (one in each stage that currently has running tasks) try to commit their output.

      This can lead to duplicate data in the output.

        Attachments

          Activity

            People

            • Assignee:
              vanzin Marcelo Vanzin
              Reporter:
              vanzin Marcelo Vanzin
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: