Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-3749

New Spark interpreter has to be restarted two times in order to work fine for different users

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.8.0, 0.8.1
    • 0.8.1, 0.9.0
    • Interpreters
    • None

    Description

      New Spark interpreter has to be restarted two times in order to work fine for different users.

      To reproduce this you have to configure zeppelin to use the new interpreter:
      zeppelin.spark.useNew -> true

      And the instantiation mode: per user - scoped

      Steps to reproduce:
      1. User A login to zeppelin and runs some spark paragraph. It should works fine.
      2. User B login to zeppelin and runs some spark paragraph, for example

      %spark
      println(sc.version)
      println(scala.util.Properties.versionString)
      

      3. This error appears (see entire log trace first_error.txt ):
      java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext. This stopped SparkContext was created at: .....
      4. The user B restart the spark interpreter from notebook page, and executes now a paragraph that throws a job, for example:

      import sqlContext.implicits._
      import org.apache.commons.io.IOUtils
      import java.net.URL
      import java.nio.charset.Charset
      
      // Zeppelin creates and injects sc (SparkContext) and sqlContext (HiveContext or SqlContext)
      // So you don't need create them manually
      
      // load bank data
      val bankText = sc.parallelize(
          IOUtils.toString(
              new URL("https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv"),
              Charset.forName("utf8")).split("\n"))
      
      sc.parallelize(1 to 1000000).foreach(n => print((java.lang.Math.random() * 1000000) + n))
      
      case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer)
      
      val bank = bankText.map(s => s.split(";")).filter(s => s(0) != "\"age\"").map(
          s => Bank(s(0).toInt, 
                  s(1).replaceAll("\"", ""),
                  s(2).replaceAll("\"", ""),
                  s(3).replaceAll("\"", ""),
                  s(5).replaceAll("\"", "").toInt
              )
      ).toDF()
      bank.registerTempTable("bank")
      

      5. This error appears (see entire log trace  second_error.txt ):
      org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 0.0 failed 4 times, most recent failure: Lost task 6.3 in stage 0.0 (TID 36, 100.96.85.172, executor 2): java.lang.ClassNotFoundException: $anonfun$1 at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:82) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) .....
      6. User B restart spark interpreter from notebook page and now it works.

      Actual Behavior:
      The user B has to restart two times spark interpreter so it can works.

      Expected Behavior:
      Spark should works fine for another users without any restarting.

      Attachments

        1. first_error.txt
          7 kB
          Jhon Cardenas
        2. second_error.txt
          8 kB
          Jhon Cardenas

        Issue Links

          Activity

            People

              zjffdu Jeff Zhang
              jcardenasd Jhon Cardenas
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: