Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Done
-
None
-
None
Description
In contrast to parfor mr jobs, where every task has its own, process-local buffer pool, on spark with multi-threaded executors, multiple tasks share a common buffer pool. This is advantageous because common inputs are just read once. However, it also requires a synchronized buffer pool initialization and cleanup per executor. Especially the cleanup (e.g., of created cache directories) is tricky because spark does not provide an executor close call. Hence, our approach is to use a robust version of deleteOnExit that is independent of the exit code and also removes remaining files that are unknown during delete registration.
Attachments
Issue Links
- is duplicated by
-
SYSTEMDS-1130 Parfor remote_spark misses a proper cleanup
- Closed