Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42766

YarnAllocator should filter excluded nodes when launching allocated containers

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.3.2
    • None
    • YARN
    • None

    Description

      In production environment, we hit an issue like this:

      If we request 10 containers form nodeA and nodeB, first response from Yarn return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second response from Yarn maybe return some containers from nodeA and launching containers, but when containers(Executor) setup and send register request to Driver, it will be rejected and this failure will be counted to 

      spark.yarn.max.executor.failures 

      , and will casue app failed.

      Max number of executor failures ($maxNumExecutorFailures) reached

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            wangshengjie wangshengjie
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: