Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
0.10.1
-
None
-
None
-
None
Description
Hey,
I got the following setup:
Spark 3.1.2 Standalone Cluster (1 Master, 2 Worker)
Zeppelin 0.10.1
SparkInterpreterSetting:
SPARK_HOME /opt/spark (points to spark-3.1.2) spark.master spark://my-spark-master-host:7077 spark.submit.deployMode client spark.jars.packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0,eu.europa.ec.joinup.sd-dss:dss-xades:5.9 spark.jars.repositories https://ec.europa.eu/cefdigital/artifact/content/repositories/esignaturedss/
I get the correct output, I I run a cell like below: Also cells which compute PI from spark examples work fine.
%spark sc.version
Using classes from additional packages provided via spark.jars.packages work fine:
%spark import com.datastax.spark.connector._ val rdd = sc.cassandraTable("mykeyspace", "mytable") println(rdd.take(5).toList)
However, if I try to add a local jar via the spark.jars property as following
spark.jars file:///absolute/path/to/my/custom/jar
the jars provided via spark.jars.packages are not part of the SparkContext. The custom jar is located at the worker and zeppelin at the same path. If I run
%spark sc.listJars().foreach(println)
without spark.jars set, I get a long list like expected (stuff from datastax + eu repos). However, if I restart the interpreter and provide the spark.jars option, the cell from above only posts my custom jar. The logs output the following:
INFO [2022-03-04 15:51:17,742] ({FIFOScheduler-interpreter_1815846009-Worker-1} SparkScala212Interpreter.scala[open]:68) - UserJars: file:/opt/zeppelin/interpreter/spark/spark-interpreter-0.10.1.jar:file:/opt/path/to/my/jar, LONG_LIST_OF_JARS_FROM_MAVEN.
...
Added JAR file:///path/to/my/custom/jar at spark://x.x.x.:xxx/jars/my-custom.jar with timestamp xxx
So it seems like the interpreter is aware of all of my jars, but only adds the ones from the spark.jars property, whereas I would expect all of the jars to be added. If I omit the spark.jars option, I get an entry ADDED JAR file:///... for each jar of the spark.jars.packages entry.
In a previous Zeppelin version (0.8.1), I was able to configure all of this via the SPARK_SUBMIT_OPTIONS environment variable like
SPARK_SUBMIT_OPTIONS=" ... --jars /abs/path/to/custom --packages cassandraconn,etc.. --repositories additional-repo
Is this a bug or am I converting these options in a wrong way?
Thank you!