Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44951

Improve Spark Dynamic Allocation

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • Kubernetes, Spark Core, YARN
    • None

    Description

      For Spark 4 we should aim to improve Spark's dynamic allocation. Some potential ideas here includes the following:

      • Plug-gable DEA algorithms
      • How to reduce wastage on the RM side? Sometimes the driver asks for some units of resources. But when RM provisions them, the driver cancels it. 
      • Support for "warm" executor pools which are not tied to a particular driver but start and wait for a driver to connect to them to "claim" them.
      • More explicit Cost Vs AppRunTime confiugration: A good DEA algo should allow the developer to choose between cost and runtime. Sometimes developers might be ok to pay higher costs for faster execution.
      • Use previous run information to inform future runs
      • Better selection of executors to be scaled down

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            holden Holden Karau
            Holden Karau Holden Karau

            Dates

              Created:
              Updated:

              Slack

                Issue deployment