Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3404

SparkSubmitSuite fails with "spark-submit exits with code 1"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.0.2, 1.1.0
    • 1.1.1, 1.2.0
    • Build
    • None

    Description

      Maven-based Jenkins builds have been failing for over a month. For example:
      https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/

      It's SparkSubmitSuite that fails. For example:
      https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/541/hadoop.version=2.0.0-mr1-cdh4.1.2,label=centos/consoleFull

      SparkSubmitSuite
      ...
      - launch simple application with spark-submit *** FAILED ***
        org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.SimpleApplicationTest, --name, testApp, --master, local, file:/tmp/1409815981504-0/testJar-1409815981505.jar) exited with code 1
        at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:837)
        at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311)
        at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply$mcV$sp(SparkSubmitSuite.scala:291)
        at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply(SparkSubmitSuite.scala:284)
        at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$14.apply(SparkSubmitSuite.scala:284)
        at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
        at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
        at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
        at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
        at org.scalatest.Transformer.apply(Transformer.scala:22)
        ...
      - spark submit includes jars passed in through --jar *** FAILED ***
        org.apache.spark.SparkException: Process List(./bin/spark-submit, --class, org.apache.spark.deploy.JarCreationTest, --name, testApp, --master, local-cluster[2,1,512], --jars, file:/tmp/1409815984960-0/testJar-1409815985029.jar,file:/tmp/1409815985030-0/testJar-1409815985087.jar, file:/tmp/1409815984959-0/testJar-1409815984959.jar) exited with code 1
        at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:837)
        at org.apache.spark.deploy.SparkSubmitSuite.runSparkSubmit(SparkSubmitSuite.scala:311)
        at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply$mcV$sp(SparkSubmitSuite.scala:305)
        at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply(SparkSubmitSuite.scala:294)
        at org.apache.spark.deploy.SparkSubmitSuite$$anonfun$15.apply(SparkSubmitSuite.scala:294)
        at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
        at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
        at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
        at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
        at org.scalatest.Transformer.apply(Transformer.scala:22)
        ...
      

      SBT builds don't fail, so it is likely to be due to some difference in how the tests are run rather than a problem with test or core project.

      This is related to http://issues.apache.org/jira/browse/SPARK-3330 but the cause identified in that JIRA is, at least, not the only cause. (Although, it wouldn't hurt to be doubly-sure this is not an issue by changing the Jenkins config to invoke mvn clean && mvn ... package mvn ... clean package.)

      This JIRA tracks investigation into a different cause. Right now I have some further information but not a PR yet.

      Part of the issue is that there is no clue in the log about why spark-submit exited with status 1. See https://github.com/apache/spark/pull/2108/files and https://issues.apache.org/jira/browse/SPARK-3193 for a change that would at least print stdout to the log too.

      The SparkSubmit program exits with 1 when the main class it is supposed to run is not found (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L322) This is for example SimpleApplicationTest (https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala#L339)

      The test actually submits an empty JAR not containing this class. It relies on spark-submit finding the class within the compiled test-classes of the Spark project. However it does seem to be compiled and present even with Maven.

      If modified to print stdout and stderr, and dump the actual command, I see an empty stdout, and only the command to stderr:

      Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/bin/java -cp null::/Users/srowen/Documents/spark/conf:/Users/srowen/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop1.0.4.jar:/Users/srowen/Documents/spark/core/target/scala-2.10/test-classes:/Users/srowen/Documents/spark/repl/target/scala-2.10/test-classes:/Users/srowen/Documents/spark/mllib/target/scala-2.10/test-classes:/Users/srowen/Documents/spark/bagel/target/scala-2.10/test-classes:/Users/srowen/Documents/spark/graphx/target/scala-2.10/test-classes:/Users/srowen/Documents/spark/streaming/target/scala-2.10/test-classes:/Users/srowen/Documents/spark/sql/catalyst/target/scala-2.10/test-classes:/Users/srowen/Documents/spark/sql/core/target/scala-2.10/test-classes:/Users/srowen/Documents/spark/sql/hive/target/scala-2.10/test-classes:/Users/srowen/Documents/Cloudera/bottou/hadoop-conf/ -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit --class org.apache.spark.deploy.JarCreationTest --name testApp --master local-cluster[2,1,512] --jars file:/var/folders/vl/nbmbr36j0692ch5r98b5cn040000gn/T/1409845282367-0/testJar-1409845282404.jar,file:/var/folders/vl/nbmbr36j0692ch5r98b5cn040000gn/T/1409845282405-0/testJar-1409845282436.jar file:/var/folders/vl/nbmbr36j0692ch5r98b5cn040000gn/T/1409845282366-0/testJar-1409845282366.jar
      

      Strangely, while tests fail under mvn test, they pass if I run just SparkSubmitSuite in Maven with mvn -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite test -rf :spark-core_2.10

      It does seem to do with the classpath that gets picked up by spark-submit varying in these different scenarios.

      I verified that the test suite and Jenkins set SPARK_TESTING=1, since that affects access to test-classes on the classpath in spark-submit.

      I'm still investigating but posting this to track the issue, which is kind of bothersome since it means Jenkins isn't catching (other) build problems from Maven. And to see if anyone has bright ideas.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              srowen Sean R. Owen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: