Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5185

pyspark --jars does not add classes to driver class path

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 2.0.0
    • Component/s: PySpark
    • Labels:
      None
    • Target Version/s:

      Description

      I have some random class I want access to from an Spark shell, say com.cloudera.science.throwaway.ThrowAway. You can find the specific example I used here:

      https://gist.github.com/laserson/e9e3bd265e1c7a896652

      I packaged it as throwaway.jar.

      If I then run bin/spark-shell like so:

      bin/spark-shell --master local[1] --jars throwaway.jar
      

      I can execute

      val a = new com.cloudera.science.throwaway.ThrowAway()
      

      Successfully.

      I now run PySpark like so:

      PYSPARK_DRIVER_PYTHON=ipython bin/pyspark --master local[1] --jars throwaway.jar
      

      which gives me an error when I try to instantiate the class through Py4J:

      In [1]: sc._jvm.com.cloudera.science.throwaway.ThrowAway()
      ---------------------------------------------------------------------------
      Py4JError                                 Traceback (most recent call last)
      <ipython-input-1-4eedbe023c29> in <module>()
      ----> 1 sc._jvm.com.cloudera.science.throwaway.ThrowAway()
      
      /Users/laserson/repos/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __getattr__(self, name)
          724     def __getattr__(self, name):
          725         if name == '__call__':
      --> 726             raise Py4JError('Trying to call a package.')
          727         new_fqn = self._fqn + '.' + name
          728         command = REFLECTION_COMMAND_NAME +\
      
      Py4JError: Trying to call a package.
      

      However, if I explicitly add the --driver-class-path to add the same jar

      PYSPARK_DRIVER_PYTHON=ipython bin/pyspark --master local[1] --jars throwaway.jar --driver-class-path throwaway.jar
      

      it works

      In [1]: sc._jvm.com.cloudera.science.throwaway.ThrowAway()
      Out[1]: JavaObject id=o18
      

      However, the docs state that --jars should also set the driver class path.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                joshrosen Josh Rosen
                Reporter:
                laserson Uri Laserson
              • Votes:
                4 Vote for this issue
                Watchers:
                26 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: