Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17511

Dynamic allocation race condition: Containers getting marked failed while releasing

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0, 2.0.1, 2.1.0
    • Fix Version/s: 2.0.1, 2.1.0
    • Component/s: YARN
    • Labels:
      None

      Description

      While trying to reach launch multiple containers in pool, if running executors count reaches or goes beyond the target running executors, the container is released and marked failed. This can cause many jobs to be marked failed causing overall job failure.

      I will have a patch up soon after completing testing.

      Typical Exception found in Driver marking the container to Failed
      java.lang.AssertionError: assertion failed
              at scala.Predef$.assert(Predef.scala:156)
              at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1.org$apache$spark$deploy$yarn$YarnAllocator$$anonfun$$updateInternalState$1(YarnAllocator.scala:489)
              at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$1.run(YarnAllocator.scala:519)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

        Attachments

          Activity

            People

            • Assignee:
              kishorvpatil Kishor Patil
              Reporter:
              kishorvpatil Kishor Patil
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: