Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21400

Spark shouldn't ignore user defined output committer in append mode

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      In https://issues.apache.org/jira/browse/SPARK-8578 we decided to override user defined output committers in append mode. The reasoning was that there's some output committers that can lead to correctness issues. Since then we have removed DirectParquetOutputCommitter (the biggest known offender) from codebase and rely on default implementations.

      I believe that we shouldn't be restricting this anymore and users should understand that if they're overwriting this configuration they have tested their committer for correctness. This unblocks using more sophisticated and performant output committers without need to overwrite file format implementations.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              robert3005 Robert Kruszewski
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: