Description
now the dynamic partitioning show the bad performance for big data due to the GC/memory overhead. this is because each task each partition now we open a writer to write the data, this will cause many small files and high GC. We can shuffle data by the partition columns so that each partition will have ony one partition file and this also reduce the gc overhead
Attachments
Issue Links
- duplicates
-
SPARK-8890 Reduce memory consumption for dynamic partition insert
- Resolved
- links to