Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4290

Provide an equivalent functionality of distributed cache as MR does

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      MapReduce allows client to specify files to be put in distributed cache for a job and the framework guarentees that the file will be available in local file system of a node where a task of the job runs and before the tasks actually starts. While this might be achieved with Yarn via hacks, it's not available in other clusters. It would be nice to have such an equivalent functionality like this in Spark.

      It would also complement Spark's broadcast variable, which may not be suitable in certain scenarios.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                xuefuz Xuefu Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: