Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
The documentation under https://zeppelin.apache.org/docs/0.8.0/interpreter/spark.html#2-loading-spark-properties describes that "–jar" option would be sufficient to load the jars for both spark driver and executor.
However, when executing under yarn mode, we see the following:
We added a very basic (1 class) jar to SPARK_SUBMIT_OPTIONS under conf/zeppelin-env.sh. "--jars "" to preload the required class "
"com.mycompany.app.App"
The jar seem to be distributed correctly to spark/yarn cluster and is available in the class path.
However the following statement on zeppelin spark notebook still fails:
import com.mycompany.app.App
"<console>:23: error: object mycompany is not a member of package com
import com.mycompany.app.App"
To make the above work, we had to go into the zeppelin interpreter configuration using the UI, added: key=spark.jars , value=" /home/hadoop/my-app-1.0-SNAPSHOT.jar"
The above import worked fine after that.
We also compared the difference between using explicit "z.load("/home/hadoop/my-app-1.0-SNAPSHOT.jar") statement and the above pre-loading method.
The explicit loading of additional jars through z.load("") does work properly under yarn mode. Again, the difference between pre-load methodology through "–jars" option and z.load() appears to be spark.jars setting.
I'd assume the fix for jar pre-load to work properly would be to provide a different methodology in zeppelin as SPARK_SUBMIT_OPTIONS method does not seem to be sufficient.