[ZEPPELIN-3749] New Spark interpreter has to be restarted two times in order to work fine for different users - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.8.0, 0.8.1
Fix Version/s: 0.8.1, 0.9.0
Component/s: Interpreters
Labels:
None
Environment:

Hide

Spark interpreter property to reproduce:

zeppelin.spark.useNew -> true

Spark interpreter instantiation mode:

per user - scoped

Zeppelin version:

branch-0.8 (Until july 23)

Spark environment:

Spark Standalone cluster

Show
Spark interpreter property to reproduce: zeppelin.spark.useNew -> true Spark interpreter instantiation mode : per user - scoped Zeppelin version: branch-0.8 (Until july 23) Spark environment : Spark Standalone cluster

Description

New Spark interpreter has to be restarted two times in order to work fine for different users.

To reproduce this you have to configure zeppelin to use the new interpreter:
zeppelin.spark.useNew -> true

And the instantiation mode: per user - scoped

Steps to reproduce:
1. User A login to zeppelin and runs some spark paragraph. It should works fine.
2. User B login to zeppelin and runs some spark paragraph, for example

%spark
println(sc.version)
println(scala.util.Properties.versionString)

3. This error appears (see entire log trace first_error.txt ):
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext. This stopped SparkContext was created at: .....
4. The user B restart the spark interpreter from notebook page, and executes now a paragraph that throws a job, for example:

import sqlContext.implicits._
import org.apache.commons.io.IOUtils
import java.net.URL
import java.nio.charset.Charset

// Zeppelin creates and injects sc (SparkContext) and sqlContext (HiveContext or SqlContext)
// So you don't need create them manually

// load bank data
val bankText = sc.parallelize(
    IOUtils.toString(
        new URL("https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv"),
        Charset.forName("utf8")).split("\n"))

sc.parallelize(1 to 1000000).foreach(n => print((java.lang.Math.random() * 1000000) + n))

case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer)

val bank = bankText.map(s => s.split(";")).filter(s => s(0) != "\"age\"").map(
    s => Bank(s(0).toInt, 
            s(1).replaceAll("\"", ""),
            s(2).replaceAll("\"", ""),
            s(3).replaceAll("\"", ""),
            s(5).replaceAll("\"", "").toInt
        )
).toDF()
bank.registerTempTable("bank")

5. This error appears (see entire log trace second_error.txt ):
org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 0.0 failed 4 times, most recent failure: Lost task 6.3 in stage 0.0 (TID 36, 100.96.85.172, executor 2): java.lang.ClassNotFoundException: $anonfun$1 at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:82) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67) .....
6. User B restart spark interpreter from notebook page and now it works.

Actual Behavior:
The user B has to restart two times spark interpreter so it can works.

Expected Behavior:
Spark should works fine for another users without any restarting.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

first_error.txt
29/Aug/18 20:04
7 kB
Jhon Cardenas
second_error.txt
29/Aug/18 20:04
8 kB
Jhon Cardenas

Issue Links

links to

GitHub Pull Request #3166

Activity

People

Assignee:: Jeff Zhang

Reporter:: Jhon Cardenas

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Aug/18 20:04

Updated:: 23/Jan/19 02:47

Resolved:: 04/Sep/18 09:13