Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11636

App stuck in ACCEPTED state, however Yarn metric thinks there are no pending apps in the queue

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.1
    • None
    • None
    • None

    Description

      Hi, I've encountered a case recently when an app gets stuck in ACCEPTED state forever in a queue.

      The queue is busy for the first 4 hrs that the app is queued, so during this time, being stuck in ACCEPTED is expected. However even as resources become available and all other jobs run, this job continues to be stuck. I've checked the following states:
      1. Resources are available at the leaf queue and cluster level.
      2. Other jobs can get the resources to run
      3. Not hitting maxAM limits. There are no other jobs queued or running in the queue at this time. However...
      4. When I look at jmx metric it seems to think the app is running. AppsRunning says 1 and containersRunning says 1 while while AppsPending says 0. However the app is staunchly in the "Accepted" state and does not seem to be running.

      Is this known or have others encountered this issue before? Or do you have any advice on what I can look into to debug it? Thanks very much for the help.

      Attachments

        Activity

          People

            Unassigned Unassigned
            helenaut Helen Weng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: