Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-1883

Can't import packages requested by SPARK_SUBMIT_OPTION in pyspark

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: pySpark
    • Labels:
      None

      Description

      Zeppelin pyspark can't import submitted packages by SPARK_SUBMIT_OPTION. For example,

      // conf/zeppelin-env.sh
      ...
      
      export SPARK_HOME="~/github/apache-spark/1.6.2-bin-hadoop2.6"
      export SPARK_SUBMIT_OPTIONS="--packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.2,TargetHolding:pyspark-cassandra:0.3.5 --exclude-packages org.slf4j:slf4j-api"
      
      ...
      

      And then try import that pyspark cassandra module in zeppelin pyspark interpreter

      import pyspark_cassandra
      
      
      Traceback (most recent call last):
        File "/var/folders/lr/8g9y625n5j39rz6qhkg8s6640000gn/T/zeppelin_pyspark-5266742863961917074.py", line 267, in <module>
          raise Exception(traceback.format_exc())
      Exception: Traceback (most recent call last):
        File "/var/folders/lr/8g9y625n5j39rz6qhkg8s6640000gn/T/zeppelin_pyspark-5266742863961917074.py", line 265, in <module>
          exec(code)
        File "<stdin>", line 1, in <module>
      ImportError: No module named pyspark_cassandra
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                1ambda Hoon Park
                Reporter:
                1ambda Hoon Park
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: