Description
Versions used:
EMR 5.16.0
Spark 2.3.1
livy 0.50
Zeppelin 0.7.3
- What spark master Livy sessions should use.
livy.spark.master yarn
livy.spark.deploy-mode cluster
I could not get a simple sparkr only code to work in zeppelin via livy interpreter, it fails with "Fail to start interpreter", works well in yarn-client mode though. Can you please shed some light on where exactly it tries to looks for sparkr packages? this issue is reproducible.
As per https://github.com/apache/incubator-livy/blob/2196302731590def9a8f8a25628dd302eac06260/repl/src/main/scala/org/apache/livy/repl/SparkRInterpreter.scala , looks like its not able to find sparkr package, where exactly is it looking for "packageDir"?:
if (sys.env.getOrElse("SPARK_YARN_MODE", "") == "true" ||
(conf.get("spark.master", "").toLowerCase == "yarn" &&
conf.get("spark.submit.deployMode", "").toLowerCase == "cluster"))
Container logs have these:
18/08/16 03:02:19 INFO SparkEntries: Created Spark session.
Exception in thread "SparkR backend" java.lang.ClassCastException: scala.Tuple2 cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
at org.apache.livy.repl.SparkRInterpreter$$anon$1.run(SparkRInterpreter.scala:83)
18/08/16 03:04:27 WARN Session: Fail to start interpreter sparkr
java.lang.IllegalArgumentException: requirement failed: Cannot find sparkr package directory.
at scala.Predef$.require(Predef.scala:224)
at org.apache.livy.repl.SparkRInterpreter$.apply(SparkRInterpreter.scala:108)
at org.apache.livy.repl.Session.liftedTree1$1(Session.scala:107)
at org.apache.livy.repl.Session.interpreter(Session.scala:98)
at org.apache.livy.repl.Session.org$apache$livy$repl$Session$$setJobGroup(Session.scala:353)
at org.apache.livy.repl.Session$$anonfun$execute$1.apply$mcV$sp(Session.scala:164)
at org.apache.livy.repl.Session$$anonfun$execute$1.apply(Session.scala:163)
at org.apache.livy.repl.Session$$anonfun$execute$1.apply(Session.scala:163)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "SparkR backend" java.lang.ClassCastException: scala.Tuple2 cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
at org.apache.livy.repl.SparkRInterpreter$$anon$1.run(SparkRInterpreter.scala:83)
18/08/16 03:06:28 WARN Session: Fail to start interpreter sparkr
java.lang.IllegalArgumentException: requirement failed: Cannot find sparkr package directory.
at scala.Predef$.require(Predef.scala:224)
at org.apache.livy.repl.SparkRInterpreter$.apply(SparkRInterpreter.scala:108)
at org.apache.livy.repl.Session.liftedTree1$1(Session.scala:107)
at org.apache.livy.repl.Session.interpreter(Session.scala:98)
at org.apache.livy.repl.Session$$anonfun$execute$1.apply$mcV$sp(Session.scala:168)
at org.apache.livy.repl.Session$$anonfun$execute$1.apply(Session.scala:163)
at org.apache.livy.repl.Session$$anonfun$execute$1.apply(Session.scala:163)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)