Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32701

mapreduce.fileoutputcommitter.algorithm.version default depends on runtime environment

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.0, 3.0.0
    • 3.0.1, 3.1.0
    • Documentation
    • None

    Description

      When someone reads the documentation in its current state, the assumption is that the default value of spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1 and that's not entirely accurate. 

      Spark doesn't explicitly set this configuration and instead is inherited from Hadoop's FileOutputCommitter class. The default value is 1 until Hadoop 3.0 where this changed to 2.

      I'm proposing that we clarify that this value's default will depend on the Hadoop version in a user's runtime environment, where:

      1 for < Hadoop 3.0
      2 for >= Hadoop 3.0

      There are also plans to revert this default again to v1 so might also be useful to reference this JIRA:
      https://issues.apache.org/jira/browse/MAPREDUCE-7282

       

      Attachments

        Activity

          People

            waleedfateem Waleed Fateem
            waleedfateem Waleed Fateem
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: