Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21394

Reviving broken callable objects in UDF in PySpark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • PySpark
    • None

    Description

      After SPARK-19161, we happened to break callable objects as UDFs in Python as below:

      >>> from pyspark.sql import functions
      >>> class F(object):
      ...     def __call__(self, x):
      ...         return x
      ...
      >>> foo = F()
      >>> foo(1)
      1
      >>> udf = functions.udf(foo)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File ".../spark/python/pyspark/sql/functions.py", line 2142, in udf
          return _udf(f=f, returnType=returnType)
        File ".../spark/python/pyspark/sql/functions.py", line 2133, in _udf
          return udf_obj._wrapped()
        File ".../spark/python/pyspark/sql/functions.py", line 2090, in _wrapped
          @functools.wraps(self.func)
        File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/functools.py", line 33, in update_wrapper
          setattr(wrapper, attr, getattr(wrapped, attr))
      AttributeError: F instance has no attribute '__name__'
      

      Note that this works in Spark 2.1 as below:

      >>> from pyspark.sql import functions
      >>> class F(object):
      ...     def __call__(self, x):
      ...         return x
      ...
      >>> foo = F()
      >>> foo(1)
      1
      >>> udf = functions.udf(foo)
      >>> spark.range(1).select(udf("id")).show()
      +-----+
      |F(id)|
      +-----+
      |    0|
      +-----+
      

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: