Uploaded image for project: 'Apache Nemo'
  1. Apache Nemo
  2. NEMO-56

Cache broadcasted data

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Later
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      In systems like Spark, broadcasted data are usually cached per executor, as the same data can be reused across multiple tasks.

      We can do something similar to avoid fetching the same data redundantly. My experience with using a Guava cache to 'load' broadcasted data has been so far good. It may be worthwhile to expose this feature as an execution property to be configured by optimization passes.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                johnyangk John Yang
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: