[SPARK-24182] Improve error message for client mode when AM fails - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.4.0
Component/s: Spark Core, YARN
Labels:
None

Description

Today, when the client AM fails, there's not a lot of useful information printed on the output. Depending on the type of failure, the information provided by the YARN AM is also not very useful. For example, you'd see this in the Spark shell:

18/05/04 11:07:38 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:86)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
 [long stack trace]

Similarly, on the YARN RM, for certain failures you see a generic error like this:

ExitCodeException exitCode=10: at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) at org.apache.hadoop.util.Shell.run(Shell.java:460) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:366) at 
[blah blah blah]

It would be nice if we could provide a more accurate description of what went wrong when possible.

Attachments

Issue Links

links to

[Github] Pull Request #21243 (vanzin)

GitHub Pull Request #21243

Activity

People

Assignee:: Marcelo Masiero Vanzin

Reporter:: Marcelo Masiero Vanzin

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 04/May/18 20:42

Updated:: 17/May/20 18:13

Resolved:: 11/May/18 09:41