Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5185

pyspark --jars does not add classes to driver class path

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 2.0.0
    • PySpark
    • None

    Description

      I have some random class I want access to from an Spark shell, say com.cloudera.science.throwaway.ThrowAway. You can find the specific example I used here:

      https://gist.github.com/laserson/e9e3bd265e1c7a896652

      I packaged it as throwaway.jar.

      If I then run bin/spark-shell like so:

      bin/spark-shell --master local[1] --jars throwaway.jar
      

      I can execute

      val a = new com.cloudera.science.throwaway.ThrowAway()
      

      Successfully.

      I now run PySpark like so:

      PYSPARK_DRIVER_PYTHON=ipython bin/pyspark --master local[1] --jars throwaway.jar
      

      which gives me an error when I try to instantiate the class through Py4J:

      In [1]: sc._jvm.com.cloudera.science.throwaway.ThrowAway()
      ---------------------------------------------------------------------------
      Py4JError                                 Traceback (most recent call last)
      <ipython-input-1-4eedbe023c29> in <module>()
      ----> 1 sc._jvm.com.cloudera.science.throwaway.ThrowAway()
      
      /Users/laserson/repos/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __getattr__(self, name)
          724     def __getattr__(self, name):
          725         if name == '__call__':
      --> 726             raise Py4JError('Trying to call a package.')
          727         new_fqn = self._fqn + '.' + name
          728         command = REFLECTION_COMMAND_NAME +\
      
      Py4JError: Trying to call a package.
      

      However, if I explicitly add the --driver-class-path to add the same jar

      PYSPARK_DRIVER_PYTHON=ipython bin/pyspark --master local[1] --jars throwaway.jar --driver-class-path throwaway.jar
      

      it works

      In [1]: sc._jvm.com.cloudera.science.throwaway.ThrowAway()
      Out[1]: JavaObject id=o18
      

      However, the docs state that --jars should also set the driver class path.

      Attachments

        Issue Links

          Activity

            People

              joshrosen Josh Rosen
              laserson Uri Laserson
              Votes:
              4 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: