Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.7.3
-
None
Description
When a spark job throws an exception with a message containing a character out of the range supported by xml 1.0, then
the application fails and the stack trace will be stored into the diagnostics field. So far, so good.
But the issue occurred when we try to get application information with the ResourceManager REST API
The xml response will contain the illegal xml 1.0 char and will be invalid.
Examples of illegals characters in xml 1.0 :
- \u0000
- \u0001
- \u0002
- \u0003
- \u0004
For more information about supported characters :
https://www.w3.org/TR/xml/#charsets
Example of illegal response from the Ressource Manager API :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <app> <id>application_1326821518301_0005</id> <user>user1</user> <name>job</name> <queue>a1</queue> <state>FINISHED</state> <finalStatus>FAILED</finalStatus> <progress>100.0</progress> <trackingUI>History</trackingUI> <trackingUrl>http://host.domain.com:8088/proxy/application_1326821518301_0005/jobhistory/job/job_1326821518301_5_5</trackingUrl> <diagnostics>Exception in thread "main" java.lang.Exception: \u0001 at com.XXXXXXXX.main(JobWithSpecialCharMain.java:6)</diagnostics> [...] </app>
Example of job to reproduce :
public class JobWithSpecialCharMain { public static void main(String[] args) throws Exception { throw new Exception("\u0001"); } }
javac -d . JobWithSpecialCharMain.java jar cvf repro.jar com/ spark-submit --class com.JobWithSpecialCharMain --master yarn-cluster repro.jar