Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Done
-
None
-
None
Description
This task aims to make the memory handling of re-used and temporary broadcast variables more robust in order to avoid unnecessary OOMs.
1) Explicitly destroy temporary broadcast variables because we would otherwise never clean them up and Spark's ContextCleaner seems to be a best effort daemon that triggers every 30min or on garbage collection (which might be too late if a large object is allocated)
2) Keep track of currently softly reachable (re-used) broadcasts in order to take their size into account when deciding on guarded collect.