Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19439

PySpark's registerJavaFunction Should Support UDAFs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.3.0
    • PySpark
    • None

    Description

      When trying to import a Scala UDAF using registerJavaFunction, I get this error:

      In [1]: sqlContext.registerJavaFunction('geo_mean', 'com.foo.bar.GeometricMean')
      ---------------------------------------------------------------------------
      Py4JJavaError                             Traceback (most recent call last)
      <ipython-input-1-1f793218436f> in <module>()
      ----> 1 sqlContext.registerJavaFunction('geo_mean', 'com.foo.bar.GeometricMean')
      
      /home/kfb/src/projects/spark/python/pyspark/sql/context.pyc in registerJavaFunction(self, name, javaClassName, returnType)
          227         if returnType is not None:
          228             jdt = self.sparkSession._jsparkSession.parseDataType(returnType.json())
      --> 229         self.sparkSession._jsparkSession.udf().registerJava(name, javaClassName, jdt)
          230
          231     # TODO(andrew): delete this once we refactor things to take in SparkSession
      
      /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
         1131         answer = self.gateway_client.send_command(command)
         1132         return_value = get_return_value(
      -> 1133             answer, self.gateway_client, self.target_id, self.name)
         1134
         1135         for temp_arg in temp_args:
      
      /home/kfb/src/projects/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
           61     def deco(*a, **kw):
           62         try:
      ---> 63             return f(*a, **kw)
           64         except py4j.protocol.Py4JJavaError as e:
           65             s = e.java_exception.toString()
      
      /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
          317                 raise Py4JJavaError(
          318                     "An error occurred while calling {0}{1}{2}.\n".
      --> 319                     format(target_id, ".", name), value)
          320             else:
          321                 raise Py4JError(
      
      Py4JJavaError: An error occurred while calling o28.registerJava.
      : java.io.IOException: UDF class com.foo.bar.GeometricMean doesn't implement any UDF interface
      	at org.apache.spark.sql.UDFRegistration.registerJava(UDFRegistration.scala:438)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
      	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
      	at py4j.Gateway.invoke(Gateway.java:280)
      	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
      	at py4j.commands.CallCommand.execute(CallCommand.java:79)
      	at py4j.GatewayConnection.run(GatewayConnection.java:214)
      	at java.lang.Thread.run(Thread.java:745)
      

      According to SPARK-10915, UDAFs in Python aren't happening anytime soon. Without this, there's no way to get Scala UDAFs into Python Spark SQL whatsoever. Fixing that would be a huge help so that we can keep aggregations in the JVM and using DataFrames. Otherwise, all our code has to drop to to RDDs and live in Python.

      Attachments

        Activity

          People

            zjffdu Jeff Zhang
            kfb Keith Bourgoin
            Votes:
            4 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: