Our fused parfor spark datapartition-execute job - as used for large scenarios of univariate statistics - exhibits some unnecessary runtime overheads. In detail, the potential improvements includes:
1) Incremental nnz maintenance on partition collect
2) Reuse of dense partitions per task (avoid reallocation)
3) Explicitly control the number of output partitions (avoid OOMs, reduce memory pressure)
4) Avoid unnecessary rdd export on parfor data partitioning
The points (3) and (4) also apply to the parfor spark datapartition job.