Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Later
-
None
-
None
-
None
Description
In systems like Spark, broadcasted data are usually cached per executor, as the same data can be reused across multiple tasks.
We can do something similar to avoid fetching the same data redundantly. My experience with using a Guava cache to 'load' broadcasted data has been so far good. It may be worthwhile to expose this feature as an execution property to be configured by optimization passes.