Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3570

Non-zero exit status of master application not propagated

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.0
    • None
    • None
    • None
    • PySpark on AWS EMR

    Description

      The master of my application fails, but the "Final app status" is 0. This causes all sorts of problems (EMR not detecting a problem, my data pipeline continuing, etc.).

      Here is what happens. The master fails (showing only relevant lines from daemons/i-…/yarn-hadoop-nodemanager-ip-….log.gz):

      2015-05-02 03:32:11,000 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor (ContainersLauncher #0): Exit code from container container_1430537363277_0001_01_000001 is : 1
      2015-05-02 03:32:11,001 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor (ContainersLauncher #0): Exception from container-launch with container ID: container_1430537363277_0001_01_000001 and exit code: 1
      2015-05-02 03:32:11,003 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch (ContainersLauncher #0): Container exited with a non-zero exit code 1
      2015-05-02 03:32:11,004 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container (AsyncDispatcher event handler): Container container_1430537363277_0001_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
      2015-05-02 03:32:11,032 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger (AsyncDispatcher event handler): USER=hadoop OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1430537363277_0001 CONTAINERID=container_1430537363277_0001_01_000001
      2015-05-02 03:32:11,032 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container (AsyncDispatcher event handler): Container container_1430537363277_0001_01_000001 transitioned from EXITED_WITH_FAILURE to DONE

      and, from ./daemons/i-…/yarn-hadoop-resourcemanager-ip-….log.gz

      2015-05-02 03:32:10,493 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl (AsyncDispatcher event handler): Updating application attempt appattempt_1430537363277_0001_000001 with final state: FINISHING, and exit status: -1000

      Now, the whole application nonetheless strangely returns a 0 exit code, in ./task-attempts/application_1430537363277_0001/container_1430537363277_0001_01_000001/stderr.gz
      :

      15/05/02 03:32:10 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.)

      The reason for this "error hiding" is maybe given by this last reason (early shutdown hook). Now, is this a possible YARN bug? or is it more likely that something is happening with the AWS EMR cluster manager that I am using (maybe it detects a task failure before YARN and shuts down the PySpark application that was running on YARN?).

      Attachments

        Activity

          People

            Unassigned Unassigned
            lebigot Eric O. LEBIGOT (EOL)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: