Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.4.0, 3.0.0
-
None
Description
When someone reads the documentation in its current state, the assumption is that the default value of spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1 and that's not entirely accurate.
Spark doesn't explicitly set this configuration and instead is inherited from Hadoop's FileOutputCommitter class. The default value is 1 until Hadoop 3.0 where this changed to 2.
I'm proposing that we clarify that this value's default will depend on the Hadoop version in a user's runtime environment, where:
1 for < Hadoop 3.0
2 for >= Hadoop 3.0
There are also plans to revert this default again to v1 so might also be useful to reference this JIRA:
https://issues.apache.org/jira/browse/MAPREDUCE-7282