Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37948

Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 3.2.0
    • None
    • Spark Core
    • None

    Description

      The hadoop MR v2 commit algorithm had a correctness issue described by SPARK-33019, and changed spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default. But some spark users like me ware unaware of this correctness issue before and had used v2 commit algorithm in spark 2.x for performance purposes. And after upgrade to spark 3.x, we encountered this correctness issue in production environment, caused a very serious failure.The trigger probability of this issue was higher in new version spark 3.x, and I didn't delve into the specific reasons. So I propose we should better disable spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 by default, if users using v2 commit algorithm, then fail the job and warn users this correctness issue. Or users can choose to force the v2 usage through a new configuration.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sleep1661 hujiahua
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: