Details
Description
Current state
Right now `udf` returns an `UserDefinedFunction` object which doesn't provide meaningful docstring:
In [1]: from pyspark.sql.types import IntegerType In [2]: from pyspark.sql.functions import udf In [3]: def _add_one(x): """Adds one""" if x is not None: return x + 1 ...: In [4]: add_one = udf(_add_one, IntegerType()) In [5]: ?add_one Type: UserDefinedFunction String form: <pyspark.sql.functions.UserDefinedFunction object at 0x7f281ed2d198> File: ~/Spark/spark-2.0/python/pyspark/sql/functions.py Signature: add_one(*cols) Docstring: User defined function in Python .. versionadded:: 1.3 In [6]: help(add_one) Help on UserDefinedFunction in module pyspark.sql.functions object: class UserDefinedFunction(builtins.object) | User defined function in Python | | .. versionadded:: 1.3 | | Methods defined here: | | __call__(self, *cols) | Call self as a function. | | __del__(self) | | __init__(self, func, returnType, name=None) | Initialize self. See help(type(self)) for accurate signature. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) (END)
It is possible to extract the function:
In [7]: ?add_one.func Signature: add_one.func(x) Docstring: Adds one File: ~/Spark/spark-2.0/<ipython-input-3-d2d8e4c530ac> Type: function In [8]: help(add_one.func) Help on function _add_one in module __main__: _add_one(x) Adds one
but it assumes that the final user is aware of the distinction between UDF and built-in functions.
Proposed
Copy input functions docstring to the UDF object or function wrapper.
In [1]: from pyspark.sql.types import IntegerType In [2]: from pyspark.sql.functions import udf In [3]: def _add_one(x): """Adds one""" if x is not None: return x + 1 ...: In [4]: add_one = udf(_add_one, IntegerType()) In [5]: ?add_one Signature: add_one(x) Docstring: Adds one SQL Type: IntegerType File: ~/Workspace/spark/<ipython-input-3-d2d8e4c530ac> Type: function In [6]: help(add_one) Help on function _add_one in module __main__: _add_one(x) Adds one SQL Type: IntegerType (END)
Attachments
Issue Links
- is related to
-
SPARK-18777 Return UDF objects when registering from Python
- Resolved
- links to