[SPARK-32701] mapreduce.fileoutputcommitter.algorithm.version default depends on runtime environment - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.4.0, 3.0.0
Fix Version/s: 3.0.1, 3.1.0
Component/s: Documentation
Labels:
None

Description

When someone reads the documentation in its current state, the assumption is that the default value of spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1 and that's not entirely accurate.

Spark doesn't explicitly set this configuration and instead is inherited from Hadoop's FileOutputCommitter class. The default value is 1 until Hadoop 3.0 where this changed to 2.

I'm proposing that we clarify that this value's default will depend on the Hadoop version in a user's runtime environment, where:

1 for < Hadoop 3.0
2 for >= Hadoop 3.0

There are also plans to revert this default again to v1 so might also be useful to reference this JIRA:
https://issues.apache.org/jira/browse/MAPREDUCE-7282

Attachments

Issue Links

links to

[Github] Pull Request #29541 (waleedfateem)

Activity

People

Assignee:: Waleed Fateem

Reporter:: Waleed Fateem

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Aug/20 18:44

Updated:: 15/Nov/21 18:26

Resolved:: 27/Aug/20 14:06