[ZEPPELIN-3830] Zeppelin loading of jars for yarn execution using zeppelin-env.sh does not work as documented - ASF JIRA

Agile Board

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

The documentation under https://zeppelin.apache.org/docs/0.8.0/interpreter/spark.html#2-loading-spark-properties describes that "–jar" option would be sufficient to load the jars for both spark driver and executor.

However, when executing under yarn mode, we see the following:

We added a very basic (1 class) jar to SPARK_SUBMIT_OPTIONS under conf/zeppelin-env.sh. "--jars "" to preload the required class "
"com.mycompany.app.App"
The jar seem to be distributed correctly to spark/yarn cluster and is available in the class path.

However the following statement on zeppelin spark notebook still fails:

import com.mycompany.app.App

"<console>:23: error: object mycompany is not a member of package com
import com.mycompany.app.App"

To make the above work, we had to go into the zeppelin interpreter configuration using the UI, added: key=spark.jars , value=" /home/hadoop/my-app-1.0-SNAPSHOT.jar"
The above import worked fine after that.

We also compared the difference between using explicit "z.load("/home/hadoop/my-app-1.0-SNAPSHOT.jar") statement and the above pre-loading method.

The explicit loading of additional jars through z.load("") does work properly under yarn mode. Again, the difference between pre-load methodology through "–jars" option and z.load() appears to be spark.jars setting.

I'd assume the fix for jar pre-load to work properly would be to provide a different methodology in zeppelin as SPARK_SUBMIT_OPTIONS method does not seem to be sufficient.