Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21383

YARN can allocate too many executors

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.2.1, 2.3.0
    • Component/s: YARN
    • Labels:
      None

      Description

      The YarnAllocator doesn't properly track containers being launched but not yet running. If it takes time to launch the containers on the NM they don't show up as numExecutorsRunning, but they are already out of the Pending list, so if the allocateResources call happens again it can think it has missing executors even when it doesn't (they just haven't been launched yet).

      This was introduced by SPARK-12447

      Where it check for missing:
      https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L297

      Only updates the numRunningExecutors after NM has started it:
      https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L524

      Thus if the NM is slow or the network is slow, it can miscount and start additional executors.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                DjvuLee DjvuLee
                Reporter:
                tgraves Thomas Graves
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: