Details
-
New Feature
-
Status: Resolved
-
Minor
-
Resolution: Abandoned
-
3.5.1
-
None
-
None
Description
Currently default coalsce does not take partition size into account and simply merges partitions. This often results in non-uniform data distribution. There have been proposal for size based coalesce(https://github.com/apache/spark/pull/27248).
I am proposing a custom roundrobin coalesce which will distribute data evenly across partitions within same executor.