In Hadoop MapReduce, tasks call FileOutputFormat.setWorkOutputPath() after configuring the output committer: https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java#L611
As a result, certain legacy output formats can fail to work out-of-the-box on Spark. In particular, org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat can fail with NullPointerExceptions, e.g.
It looks like someone on GitHub has hit the same problem: https://gist.github.com/themodernlife/e3b07c23ba978f6cc98b73e3f3609abe
Tez had a very similar bug: https://issues.apache.org/jira/browse/TEZ-3348
We might be able to fix this by having Spark mimic Hadoop's logic. I'm unsure of whether that change would pose compatibility risks for other existing workloads, though.