-
Type:
Improvement
-
Status: Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: 3.1.0
-
Fix Version/s: None
-
Component/s: Input/Output
-
Labels:None
In the current spark implementation if you set,
spark.sql.sources.partitionOverwriteMode=dynamic
even with
mapreduce.fileoutputcommitter.algorithm.version=2
it would still rename the partition folder sequentially in commitJob stage as shown here:
This is very slow in cloud storage. We should commit the data similar to FileOutputCommitter v2?
- relates to
-
SPARK-20049 Writing data to Parquet with partitions takes very long after the job finishes
-
- Open
-