Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43510

Spark application hangs when YarnAllocator adds running executors after processing completed containers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.1, 3.5.0
    • YARN
    • None

    Description

      I see application hangs when containers are preempted immediately after allocation as follows.

      23/05/14 09:11:33 INFO YarnAllocator: Launching container container_e3812_1684033797982_57865_01_000382 on host hdc42-mcc10-01-0910-4207-015-tess0028.stratus.rno.ebay.com for executor with ID 277 for ResourceProfile Id 0 
      23/05/14 09:11:33 WARN YarnAllocator: Cannot find executorId for container: container_e3812_1684033797982_57865_01_000382
      23/05/14 09:11:33 INFO YarnAllocator: Completed container container_e3812_1684033797982_57865_01_000382 (state: COMPLETE, exit status: -102)
      23/05/14 09:11:33 INFO YarnAllocator: Container container_e3812_1684033797982_57865_01_000382 was preempted.

      Note the warning log where YarnAllocator cannot find executorId for the container when processing completed containers. The only plausible cause is YarnAllocator added the running executor after processing completed containers. The former happens in a separate thread after executor launch.

      YarnAllocator believes there are still running executors, although they are already lost due to preemption. Hence, the application hangs without any running executors.

      Attachments

        Activity

          People

            mauzhang Manu Zhang
            mauzhang Manu Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: