Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30191

AM should update pending resource request faster when driver lost executor

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Minor
    • Resolution: Unresolved
    • 3.1.0
    • None
    • Spark Core, YARN
    • None

    Description

      I run spark on yarn.  I found that when driver lost its executors because of machine hardware problem and all of service includes nodemanager, executor on the node has been killed,  it means that Resourcemanager can't update the containers info on the node until Resourcemanager try to remove the node,   but it always takes 10 mins or longger, and in the meantime, AM don't add the new resource request and driver missing the executors.

      So maybe AM should add the factor `numExecutorsExiting` in YarnAllocator's method `

      updateResourceRequests`  to optimize it.

       

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              max2049 Max Xie
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: