Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.5.0
-
None
Description
I've recently been shell-scripting the creation of many concurrent Spark-on-YARN apps and observing a fraction of them to fail with what I'm guessing is a race condition in their Maven-coordinate resolution.
For example, I might spawn an app for each path in file paths with the following shell script:
cat paths | parallel "$SPARK_HOME/bin/spark-submit foo.jar {}"
When doing this, I observe some fraction of the spawned jobs to fail with errors like:
:: retrieving :: org.apache.spark#spark-submit-parent confs: [default] Exception in thread "main" java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent: java.text.ParseException: failed to parse report: /hpc/users/willir31/.ivy2/cache/org.apache.spark-spark-submit-parent-default.xml: Premature end of file. at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:249) at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:83) at org.apache.ivy.Ivy.retrieve(Ivy.java:551) at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1006) at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:286) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.text.ParseException: failed to parse report: /hpc/users/willir31/.ivy2/cache/org.apache.spark-spark-submit-parent-default.xml: Premature end of file. at org.apache.ivy.plugins.report.XmlReportParser.parse(XmlReportParser.java:293) at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:329) at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:118) ... 7 more Caused by: org.xml.sax.SAXParseException; Premature end of file. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
The more apps I try to launch simultaneously, the greater fraction of them seem to fail with this or similar errors; a batch of ~10 will usually work fine, a batch of 15 will see a few failures, and a batch of ~60 will have dozens of failures.
Attachments
Issue Links
- is duplicated by
-
SPARK-21507 Exception when using spark.jars.packages
- Resolved
- links to