Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-1883

Can't import packages requested by SPARK_SUBMIT_OPTION in pyspark

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: pySpark
    • Labels:
      None

      Description

      Zeppelin pyspark can't import submitted packages by SPARK_SUBMIT_OPTION. For example,

      // conf/zeppelin-env.sh
      ...
      
      export SPARK_HOME="~/github/apache-spark/1.6.2-bin-hadoop2.6"
      export SPARK_SUBMIT_OPTIONS="--packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.2,TargetHolding:pyspark-cassandra:0.3.5 --exclude-packages org.slf4j:slf4j-api"
      
      ...
      

      And then try import that pyspark cassandra module in zeppelin pyspark interpreter

      import pyspark_cassandra
      
      
      Traceback (most recent call last):
        File "/var/folders/lr/8g9y625n5j39rz6qhkg8s6640000gn/T/zeppelin_pyspark-5266742863961917074.py", line 267, in <module>
          raise Exception(traceback.format_exc())
      Exception: Traceback (most recent call last):
        File "/var/folders/lr/8g9y625n5j39rz6qhkg8s6640000gn/T/zeppelin_pyspark-5266742863961917074.py", line 265, in <module>
          exec(code)
        File "<stdin>", line 1, in <module>
      ImportError: No module named pyspark_cassandra
      

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user 1ambda opened a pull request:

          https://github.com/apache/zeppelin/pull/1831

          ZEPPELIN-1883 Can't import spark submitted packages in PySpark

              1. What is this PR for?

          Fixed importing packages in pyspack requested by `SPARK_SUBMIT_OPTION`

              1. What type of PR is it?
                [Bug Fix]
              1. Todos

          Nothing

              1. What is the Jira issue?

          ZEPPELIN-1883(https://issues.apache.org/jira/browse/ZEPPELIN-1883)

              1. How should this be tested?

          1. Set `SPARK_HOME` and `SPARK_SUBMIT_OPTION` in `conf/zeppelin-env.sh` like

          ```sh
          export SPARK_HOME="~/github/apache-spark/1.6.2-bin-hadoop2.6"
          export SPARK_SUBMIT_OPTIONS="--packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.2,TargetHolding:pyspark-cassandra:0.3.5 --exclude-packages org.slf4j:slf4j-api"
          ```

          2. Test whether submitted packages can be import or not

          ```
          %pyspark

          import pyspark_cassandra
          ```

              1. Screenshots (if appropriate)

          ```
          import pyspark_cassandra

          Traceback (most recent call last):
          File "/var/folders/lr/8g9y625n5j39rz6qhkg8s6640000gn/T/zeppelin_pyspark-5266742863961917074.py", line 267, in <module>
          raise Exception(traceback.format_exc())
          Exception: Traceback (most recent call last):
          File "/var/folders/lr/8g9y625n5j39rz6qhkg8s6640000gn/T/zeppelin_pyspark-5266742863961917074.py", line 265, in <module>
          exec(code)
          File "<stdin>", line 1, in <module>
          ImportError: No module named pyspark_cassandra
          ```

              1. Questions:
          • Does the licenses files need update? - NO
          • Is there breaking changes for older versions? - NO
          • Does this needs documentation? - NO

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/1ambda/zeppelin ZEPPELIN-1883/cant-import-submitted-packages-in-pyspark

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/zeppelin/pull/1831.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #1831


          commit c735bd54b1ce712641ae9d2c4b780d954c0e985c
          Author: 1ambda <1amb4a@gmail.com>
          Date: 2017-01-02T04:52:40Z

          fix: Import spark submit packages in pyspark


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user 1ambda opened a pull request: https://github.com/apache/zeppelin/pull/1831 ZEPPELIN-1883 Can't import spark submitted packages in PySpark What is this PR for? Fixed importing packages in pyspack requested by `SPARK_SUBMIT_OPTION` What type of PR is it? [Bug Fix] Todos Nothing What is the Jira issue? ZEPPELIN-1883 ( https://issues.apache.org/jira/browse/ZEPPELIN-1883 ) How should this be tested? 1. Set `SPARK_HOME` and `SPARK_SUBMIT_OPTION` in `conf/zeppelin-env.sh` like ```sh export SPARK_HOME="~/github/apache-spark/1.6.2-bin-hadoop2.6" export SPARK_SUBMIT_OPTIONS="--packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.2,TargetHolding:pyspark-cassandra:0.3.5 --exclude-packages org.slf4j:slf4j-api" ``` 2. Test whether submitted packages can be import or not ``` %pyspark import pyspark_cassandra ``` Screenshots (if appropriate) ``` import pyspark_cassandra Traceback (most recent call last): File "/var/folders/lr/8g9y625n5j39rz6qhkg8s6640000gn/T/zeppelin_pyspark-5266742863961917074.py", line 267, in <module> raise Exception(traceback.format_exc()) Exception: Traceback (most recent call last): File "/var/folders/lr/8g9y625n5j39rz6qhkg8s6640000gn/T/zeppelin_pyspark-5266742863961917074.py", line 265, in <module> exec(code) File "<stdin>", line 1, in <module> ImportError: No module named pyspark_cassandra ``` Questions: Does the licenses files need update? - NO Is there breaking changes for older versions? - NO Does this needs documentation? - NO You can merge this pull request into a Git repository by running: $ git pull https://github.com/1ambda/zeppelin ZEPPELIN-1883 /cant-import-submitted-packages-in-pyspark Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zeppelin/pull/1831.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1831 commit c735bd54b1ce712641ae9d2c4b780d954c0e985c Author: 1ambda <1amb4a@gmail.com> Date: 2017-01-02T04:52:40Z fix: Import spark submit packages in pyspark
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user astroshim commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          In my test, I got a ` INFO [2017-01-02 09:08:12,358] (

          {Exec Default Executor}

          RemoteInterpreterManagedProcess.java[onProcessComplete]:164) - Interpreter process exited 0` error when i try to run the paragraph.
          Maybe this error occurs when couldn't download libraries of `SPARK_SUBMIT_OPTIONS` option.
          Is this normal behavior?

          Show
          githubbot ASF GitHub Bot added a comment - Github user astroshim commented on the issue: https://github.com/apache/zeppelin/pull/1831 In my test, I got a ` INFO [2017-01-02 09:08:12,358] ( {Exec Default Executor} RemoteInterpreterManagedProcess.java [onProcessComplete] :164) - Interpreter process exited 0` error when i try to run the paragraph. Maybe this error occurs when couldn't download libraries of `SPARK_SUBMIT_OPTIONS` option. Is this normal behavior?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user 1ambda commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          @astroshim Thanks for review!

          It's the expected behavior. If spark submit doens't be properly loaded, spark interpreter will die without errors.

          Show
          githubbot ASF GitHub Bot added a comment - Github user 1ambda commented on the issue: https://github.com/apache/zeppelin/pull/1831 @astroshim Thanks for review! It's the expected behavior. If spark submit doens't be properly loaded, spark interpreter will die without errors.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zjffdu commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          @1ambda Spark doesn't support specifying python packages throught `-packages`, the correct usage is to use `py-files`. Although this PR could resolve your issue, but the issue here is not due to zeppelin bug, it is because of wrong usage of `-packages`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1831 @1ambda Spark doesn't support specifying python packages throught `- packages`, the correct usage is to use ` py-files`. Although this PR could resolve your issue, but the issue here is not due to zeppelin bug, it is because of wrong usage of ` -packages`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user felixcheung commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          right, I'm a bit concern if this would be the right fix for the issue?

          Show
          githubbot ASF GitHub Bot added a comment - Github user felixcheung commented on the issue: https://github.com/apache/zeppelin/pull/1831 right, I'm a bit concern if this would be the right fix for the issue?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user 1ambda commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          @zjffdu Thanks for review

          Then, How can I load [pyspark-cassandra](https://github.com/TargetHolding/pyspark-cassandra#with-spark-packages) for pyspark?

          Show
          githubbot ASF GitHub Bot added a comment - Github user 1ambda commented on the issue: https://github.com/apache/zeppelin/pull/1831 @zjffdu Thanks for review Then, How can I load [pyspark-cassandra] ( https://github.com/TargetHolding/pyspark-cassandra#with-spark-packages ) for pyspark?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zjffdu commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          @1ambda Actually, pyspark-cassandra doesn't work for me in pyspark shell. I guess it works because you have installed it locally.
          ```
          >>> import pyspark_cassandra
          Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          ImportError: No module named pyspark_cassandra
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1831 @1ambda Actually, pyspark-cassandra doesn't work for me in pyspark shell. I guess it works because you have installed it locally. ``` >>> import pyspark_cassandra Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named pyspark_cassandra ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user 1ambda commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          @zjffdu

          I'v just created gist to show `--packages` option download pyspark-cassandra. https://gist.github.com/1ambda/5caf92753ea2f95ada11b1c13945d261

          ```
          downloading https://repo1.maven.org/maven2/com/datastax/spark/spark-cassandra-connector_2.10/1.6.2/spark-cassandra-connector_2.10-1.6.2.jar ...
          [SUCCESSFUL ] com.datastax.spark#spark-cassandra-connector_2.10;1.6.2!spark-cassandra-connector_2.10.jar (450ms)
          downloading http://dl.bintray.com/spark-packages/maven/TargetHolding/pyspark-cassandra/0.3.5/pyspark-cassandra-0.3.5.jar ...
          [SUCCESSFUL ] TargetHolding#pyspark-cassandra;0.3.5!pyspark-cassandra.jar (310ms)
          downloading https://repo1.maven.org/maven2/com/datastax/spark/spark-cassandra-connector-java_2.10/1.6.0-M1/spark-cassandra-connector-java_2.10-1.6.0-M1.jar ...
          [SUCCESSFUL ] com.datastax.spark#spark-cassandra-connector-java_2.10;1.6.0-M1!spark-cassandra-connector-java_2.10.jar (23ms)
          downloading https://repo1.maven.org/maven2/com/datastax/cassandra/cassandra-driver-core/3.0.0/cassandra-driver-core-3.0.0.jar ...
          [SUCCESSFUL ] com.datastax.cassandra#cassandra-driver-core;3.0.0!cassandra-driver-core.jar(bundle) (78ms)
          :: resolution report :: resolve 2819ms :: artifacts dl 870ms
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user 1ambda commented on the issue: https://github.com/apache/zeppelin/pull/1831 @zjffdu I'v just created gist to show `--packages` option download pyspark-cassandra. https://gist.github.com/1ambda/5caf92753ea2f95ada11b1c13945d261 ``` downloading https://repo1.maven.org/maven2/com/datastax/spark/spark-cassandra-connector_2.10/1.6.2/spark-cassandra-connector_2.10-1.6.2.jar ... [SUCCESSFUL ] com.datastax.spark#spark-cassandra-connector_2.10;1.6.2!spark-cassandra-connector_2.10.jar (450ms) downloading http://dl.bintray.com/spark-packages/maven/TargetHolding/pyspark-cassandra/0.3.5/pyspark-cassandra-0.3.5.jar ... [SUCCESSFUL ] TargetHolding#pyspark-cassandra;0.3.5!pyspark-cassandra.jar (310ms) downloading https://repo1.maven.org/maven2/com/datastax/spark/spark-cassandra-connector-java_2.10/1.6.0-M1/spark-cassandra-connector-java_2.10-1.6.0-M1.jar ... [SUCCESSFUL ] com.datastax.spark#spark-cassandra-connector-java_2.10;1.6.0-M1!spark-cassandra-connector-java_2.10.jar (23ms) downloading https://repo1.maven.org/maven2/com/datastax/cassandra/cassandra-driver-core/3.0.0/cassandra-driver-core-3.0.0.jar ... [SUCCESSFUL ] com.datastax.cassandra#cassandra-driver-core;3.0.0!cassandra-driver-core.jar(bundle) (78ms) :: resolution report :: resolve 2819ms :: artifacts dl 870ms ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zjffdu commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          hmm, it works in local mode but doesn't work in yarn-client mode. Could you try yarn-client mode ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1831 hmm, it works in local mode but doesn't work in yarn-client mode. Could you try yarn-client mode ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user 1ambda commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          I tested on yarn-client, mesos-client and found that

          ```python
          Using Python version 2.6.6 (r266:84292, Aug 18 2016 15:13:37)
          SparkContext available as sc, HiveContext available as sqlContext.
          >>> import pyspark_cassandra
          Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "/tmp/spark-df7bc8fa-233f-4124-855b-4a39fa948c1a/userFiles-ab70ffa3-212b-47ee-9611-9c240d3ce899/TargetHolding_pyspark-cassandra-0.3.5.jar/pyspark_cassandra/_init_.py", line 24, in <module>
          File "/tmp/spark-df7bc8fa-233f-4124-855b-4a39fa948c1a/userFiles-ab70ffa3-212b-47ee-9611-9c240d3ce899/TargetHolding_pyspark-cassandra-0.3.5.jar/pyspark_cassandra/context.py", line 16, in <module>
          File "/tmp/spark-df7bc8fa-233f-4124-855b-4a39fa948c1a/userFiles-ab70ffa3-212b-47ee-9611-9c240d3ce899/TargetHolding_pyspark-cassandra-0.3.5.jar/pyspark_cassandra/rdd.py", line 291
          k = Row(**

          {c: row.__getattr__(c) for c in columns}

          )
          ^
          SyntaxError: invalid syntax
          >>>
          ```

          ```scala
          // If we're running a python app, set the main class to our specific python runner
          if (args.isPython && deployMode == CLIENT) {

          ...

          if (clusterManager != YARN)

          { // The YARN backend handles python files differently, so don't merge the lists. args.files = mergeFileLists(args.files, args.pyFiles) }

          ```

            1. Summary

          @zjffdu @felixcheung

          1. I am not sure why they decided not to copy py-files in yarn-client mode. But it's problem of spark, not zeppelin.
          2. As you saw, this is *expected behavior* at least in local, mesos-client.

          Show
          githubbot ASF GitHub Bot added a comment - Github user 1ambda commented on the issue: https://github.com/apache/zeppelin/pull/1831 I tested on yarn-client, mesos-client and found that * mesos-client mode copy pyspark-cassandra submitted by `--packages` * as you can see [here] ( https://gist.github.com/1ambda/e3326107d14ece9a39663cbc56f05756 ) (the error is due to invalid python version, not problem of spark, pyspark-cassandra) ```python Using Python version 2.6.6 (r266:84292, Aug 18 2016 15:13:37) SparkContext available as sc, HiveContext available as sqlContext. >>> import pyspark_cassandra Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/tmp/spark-df7bc8fa-233f-4124-855b-4a39fa948c1a/userFiles-ab70ffa3-212b-47ee-9611-9c240d3ce899/TargetHolding_pyspark-cassandra-0.3.5.jar/pyspark_cassandra/_ init _.py", line 24, in <module> File "/tmp/spark-df7bc8fa-233f-4124-855b-4a39fa948c1a/userFiles-ab70ffa3-212b-47ee-9611-9c240d3ce899/TargetHolding_pyspark-cassandra-0.3.5.jar/pyspark_cassandra/context.py", line 16, in <module> File "/tmp/spark-df7bc8fa-233f-4124-855b-4a39fa948c1a/userFiles-ab70ffa3-212b-47ee-9611-9c240d3ce899/TargetHolding_pyspark-cassandra-0.3.5.jar/pyspark_cassandra/rdd.py", line 291 k = Row(** {c: row.__getattr__(c) for c in columns} ) ^ SyntaxError: invalid syntax >>> ``` * yarn-client mode doens't copy pyFiles * as you can see [here] ( https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L354-L357 ) ```scala // If we're running a python app, set the main class to our specific python runner if (args.isPython && deployMode == CLIENT) { ... if (clusterManager != YARN) { // The YARN backend handles python files differently, so don't merge the lists. args.files = mergeFileLists(args.files, args.pyFiles) } ``` Summary @zjffdu @felixcheung 1. I am not sure why they decided not to copy py-files in yarn-client mode. But it's problem of spark, not zeppelin. 2. As you saw, this is * expected behavior * at least in local, mesos-client.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user 1ambda commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          Any update on this?

          Show
          githubbot ASF GitHub Bot added a comment - Github user 1ambda commented on the issue: https://github.com/apache/zeppelin/pull/1831 Any update on this?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zjffdu commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          I still think this is not a correct fix since it doesn't resolve the yarn-client mode which I believe most of users use this mode.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1831 I still think this is not a correct fix since it doesn't resolve the yarn-client mode which I believe most of users use this mode.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user 1ambda commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          @zjffdu

          > since it doesn't resolve the yarn-client mode

          1. PySpark also doens't support extending PYTHONPATH in yarn-client.
          2. You are saying this is not right fix repeatedly without providing any other idea. So let me ask

          • How you can load pyspark-cassandra using packages as described in their README.md in local, mesos-client mode.
          Show
          githubbot ASF GitHub Bot added a comment - Github user 1ambda commented on the issue: https://github.com/apache/zeppelin/pull/1831 @zjffdu > since it doesn't resolve the yarn-client mode 1. PySpark also doens't support extending PYTHONPATH in yarn-client. 2. You are saying this is not right fix repeatedly without providing any other idea. So let me ask How you can load pyspark-cassandra using packages as described in their README.md in local, mesos-client mode.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zjffdu commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          As I said before why not using `--py-files`, I check the repository of pyspark-cassandra.
          https://github.com/TargetHolding/pyspark-cassandra

          README shows that user can use `--py-files`

          ```
          spark-submit \
          -jars /path/to/pyspark-cassandra-assembly<version>.jar \
          -driver-class-path /path/to/pyspark-cassandra-assembly<version>.jar \
          -py-files /path/to/pyspark-cassandra-assembly<version>.jar \
          --conf spark.cassandra.connection.host=your,cassandra,node,names \
          --master spark://spark-master:7077 \
          yourscript.py
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1831 As I said before why not using `--py-files`, I check the repository of pyspark-cassandra. https://github.com/TargetHolding/pyspark-cassandra README shows that user can use `--py-files` ``` spark-submit \ - jars /path/to/pyspark-cassandra-assembly <version>.jar \ - driver-class-path /path/to/pyspark-cassandra-assembly <version>.jar \ - py-files /path/to/pyspark-cassandra-assembly <version>.jar \ --conf spark.cassandra.connection.host=your,cassandra,node,names \ --master spark://spark-master:7077 \ yourscript.py ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user 1ambda commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          1. I read and replied before.

          > Q. README shows that user can use --py-files
          > A. Users cannot benefit from --packages. They need to download, find location of all transitive deps and provide the paths to --py-files

          And even in spark, we can use `--packages` in local, mesos-client. Why do you think zeppelin should't do?

          2. I tested this PR in yarn-client and it works. How did you test this PR in yarn-client?

          > since it doesn't resolve the yarn-client mode

          Could you tell me your env?

          • how did you build (command, env)
          • zeppelin, yarn, spark versions.
          Show
          githubbot ASF GitHub Bot added a comment - Github user 1ambda commented on the issue: https://github.com/apache/zeppelin/pull/1831 1. I read and replied before. > Q. README shows that user can use --py-files > A. Users cannot benefit from --packages. They need to download, find location of all transitive deps and provide the paths to --py-files And even in spark, we can use `--packages` in local, mesos-client. Why do you think zeppelin should't do? 2. I tested this PR in yarn-client and it works. How did you test this PR in yarn-client? > since it doesn't resolve the yarn-client mode Could you tell me your env? how did you build (command, env) zeppelin, yarn, spark versions.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zjffdu commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          Sorry, I miss your last reply. Do you mean yarn-client mode works for you in spark ?
          I use the following command to launch pyspark and get the error as following:

          Launch pyspark ( I am using spark 2.1.0)
          ```
          bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.2,TargetHolding:pyspark-cassandra:0.3.5 --exclude-packages org.slf4j:slf4j-api --master yarn-client
          ```

          Fail to import pyspark_cassandra
          ```
          >>> import pyspark_cassandra
          Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          ImportError: No module named pyspark_cassandra
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1831 Sorry, I miss your last reply. Do you mean yarn-client mode works for you in spark ? I use the following command to launch pyspark and get the error as following: Launch pyspark ( I am using spark 2.1.0) ``` bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.2,TargetHolding:pyspark-cassandra:0.3.5 --exclude-packages org.slf4j:slf4j-api --master yarn-client ``` Fail to import pyspark_cassandra ``` >>> import pyspark_cassandra Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named pyspark_cassandra ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user 1ambda commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          @zjffdu I'v just fixed not to extend PYTHONPATH using submitted packages only in yarn-client

          Show
          githubbot ASF GitHub Bot added a comment - Github user 1ambda commented on the issue: https://github.com/apache/zeppelin/pull/1831 @zjffdu I'v just fixed not to extend PYTHONPATH using submitted packages only in yarn-client
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user zjffdu commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          Thanks, @1ambda Do you mind to create a spark ticket as well ? The behavior inconsistency between different modes seems an issue of spark, we need to clarify it with spark community.

          Show
          githubbot ASF GitHub Bot added a comment - Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1831 Thanks, @1ambda Do you mind to create a spark ticket as well ? The behavior inconsistency between different modes seems an issue of spark, we need to clarify it with spark community.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user 1ambda commented on the issue:

          https://github.com/apache/zeppelin/pull/1831

          *For reviewers*

          Fixed to use `spark.jars` instead of `classpath`.

          • classpath doesn't include submitted jars at this moment (i could get 7 days ago, but not now)
          • it enable to simplify logic since we don't need to get classpath before setting new classpath. In other words, we can directly set PYTHONPATH in `setupPySparkEnv` function.
          • Also tested in spark 1.6.2, spark 2.0.0

          @zjffdu

          The spark code we talked came from https://github.com/apache/spark/pull/6360. It seems like intended so it's ok not to raise an issue.

          Show
          githubbot ASF GitHub Bot added a comment - Github user 1ambda commented on the issue: https://github.com/apache/zeppelin/pull/1831 * For reviewers * Fixed to use `spark.jars` instead of `classpath`. classpath doesn't include submitted jars at this moment (i could get 7 days ago, but not now) it enable to simplify logic since we don't need to get classpath before setting new classpath. In other words, we can directly set PYTHONPATH in `setupPySparkEnv` function. Also tested in spark 1.6.2, spark 2.0.0 @zjffdu The spark code we talked came from https://github.com/apache/spark/pull/6360 . It seems like intended so it's ok not to raise an issue.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/zeppelin/pull/1831

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/zeppelin/pull/1831

            People

            • Assignee:
              1ambda Hoon Park
              Reporter:
              1ambda Hoon Park
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development