Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Done
-
None
-
None
Description
This task aims to avoid unnecessary input caching for parfor spark datapartition-execute jobs (with grouping) in order to reduce the memory pressure and thus garbage collection overhead during shuffle and subsequent execution. We only apply this for the general case with grouping and if the input is a persisted rdd which has not been cached yet.