Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-5666

Spark Additional jars: Does spark.jars override spark.jars.packages?

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 0.10.1
    • None
    • None
    • None

    Description

      Hey,

      I got the following setup:

      Spark 3.1.2 Standalone Cluster (1 Master, 2 Worker)

      Zeppelin 0.10.1

      SparkInterpreterSetting:

       

      SPARK_HOME     /opt/spark (points to spark-3.1.2) 
      spark.master   spark://my-spark-master-host:7077
      
      spark.submit.deployMode  client
      spark.jars.packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0,eu.europa.ec.joinup.sd-dss:dss-xades:5.9 
      
      spark.jars.repositories  https://ec.europa.eu/cefdigital/artifact/content/repositories/esignaturedss/

      I get the correct output, I I run a cell like below: Also cells which compute PI from spark examples work fine.

      %spark
      sc.version 

      Using classes from additional packages provided via spark.jars.packages work fine:

      %spark
      import com.datastax.spark.connector._
      
      val rdd = sc.cassandraTable("mykeyspace", "mytable")
      println(rdd.take(5).toList) 

      However, if I try to add a local jar via the spark.jars property as following

      spark.jars file:///absolute/path/to/my/custom/jar
      

      the jars provided via spark.jars.packages are not part of the SparkContext. The custom jar is located at the worker and zeppelin at the same path. If I run

      %spark
      sc.listJars().foreach(println) 

      without spark.jars set, I get a long list like expected (stuff from datastax + eu repos). However, if I restart the interpreter and provide the spark.jars option, the cell from above only posts my custom jar. The logs output the following:

      INFO [2022-03-04 15:51:17,742] ({FIFOScheduler-interpreter_1815846009-Worker-1} SparkScala212Interpreter.scala[open]:68) - UserJars: file:/opt/zeppelin/interpreter/spark/spark-interpreter-0.10.1.jar:file:/opt/path/to/my/jar, LONG_LIST_OF_JARS_FROM_MAVEN.
      
      ...
      
      Added JAR file:///path/to/my/custom/jar at spark://x.x.x.:xxx/jars/my-custom.jar with timestamp xxx 

      So it seems like the interpreter is aware of all of my jars, but only adds the ones from the spark.jars property, whereas I would expect all of the jars to be added. If I omit the spark.jars option, I get an entry ADDED JAR file:///... for each jar of the spark.jars.packages entry. 

      In a previous Zeppelin version (0.8.1), I was able to configure all of this via the SPARK_SUBMIT_OPTIONS environment variable like 

      SPARK_SUBMIT_OPTIONS=" ... --jars /abs/path/to/custom --packages cassandraconn,etc.. --repositories additional-repo

      Is this a bug or am I converting these options in a wrong way?

      Thank you!

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            lukas.heppe Lukas Heppe
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment