Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26268

Decouple shuffle data from Spark deployment

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: Shuffle
    • Labels:
      None

      Description

      Right now the batch scheduler assumes that shuffle data is tied to executors. As a result, when an executor is lost, any map tasks that ran on that executor are rescheduled unless the "external" shuffle service is being used. Note that this service is only external in the sense that it does not live within executors themselves; its implementation cannot be swapped out and it is assumed to speak the BlockManager language.

      The following changes would facilitate external shuffle (see SPARK-25299 for motivation):

      • Do not rerun map tasks on lost executors when shuffle data is stored externally. For example, this could be determined by a property or by an additional method that all ShuffleManagers implement.
      • Do not assume that shuffle data is stored in the standard BlockManager format or that a BlockManager is or must be available to ShuffleManagers.

      Note that only the first change is actually required to realize the benefits of remote shuffle implementations as a phony (or null) BlockManager can be used by shuffle implementations.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                bsidhom Ben Sidhom
              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: