Description
The parfor optimizer may decide to execute the entire loop as a remote Spark job to utilize cluster parallelism. In this case all inputs to the parfor body (i.e., variable that are created or read outside of the parfor body but used or overwritten inside) are read from HDFS. In the past there was an issue of redundant reads, which has been addressed with SYSTEMML-1879. However, the direct use of Spark broadcast variables would likely improve performance, especially in clusters with many nodes.
This task aims to leverage Spark broadcast variables for all parfor inputs. In detail this entails two major aspects. First, we need runtime support to optionally broadcast the inputs via broadcast variables in RemoteParForSpark and obtain them from these broadcast variables in RemoteParForSparkWorker without causing unnecessary eviction. In contrast, to the existing broadcast primitives, we don't need to blockify the matrix because the matrix is accessed in full by in-memory operations. Second, this requires an extension of the parfor optimizer to reason about scenarios where it is safe to use broadcast because these broadcasts cause additional memory requirements since they act as pinned in memory matrices. This second task has likely overlap with SYSTEMML-1349 which requires a similar reasoning to handle shared reads.
Attachments
Issue Links
- depends upon
-
SYSTEMDS-1349 Parfor optimizer support for shared reads (lower memory req)
- Closed