Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0, 2.0.0, 2.1.0, 2.2.0
    • 2.2.0
    • PySpark, SQL
    • None

    Description

      Current state

      Right now `udf` returns an `UserDefinedFunction` object which doesn't provide meaningful docstring:

      In [1]: from pyspark.sql.types import IntegerType
      
      In [2]: from pyspark.sql.functions import udf
      
      In [3]: def _add_one(x):
              """Adds one"""
              if x is not None:
                      return x + 1
         ...:     
      
      In [4]: add_one = udf(_add_one, IntegerType())
      
      In [5]: ?add_one
      Type:        UserDefinedFunction
      String form: <pyspark.sql.functions.UserDefinedFunction object at 0x7f281ed2d198>
      File:        ~/Spark/spark-2.0/python/pyspark/sql/functions.py
      Signature:   add_one(*cols)
      Docstring:
      User defined function in Python
      
      .. versionadded:: 1.3
      
      In [6]: help(add_one)
      
      Help on UserDefinedFunction in module pyspark.sql.functions object:
      
      class UserDefinedFunction(builtins.object)
       |  User defined function in Python
       |  
       |  .. versionadded:: 1.3
       |  
       |  Methods defined here:
       |  
       |  __call__(self, *cols)
       |      Call self as a function.
       |  
       |  __del__(self)
       |  
       |  __init__(self, func, returnType, name=None)
       |      Initialize self.  See help(type(self)) for accurate signature.
       |  
       |  ----------------------------------------------------------------------
       |  Data descriptors defined here:
       |  
       |  __dict__
       |      dictionary for instance variables (if defined)
       |  
       |  __weakref__
       |      list of weak references to the object (if defined)
      (END)
      
      

      It is possible to extract the function:

      In [7]: ?add_one.func
      
      Signature: add_one.func(x)
      Docstring: Adds one
      File:      ~/Spark/spark-2.0/<ipython-input-3-d2d8e4c530ac>
      Type:      function
      
      In [8]: help(add_one.func)
      
      Help on function _add_one in module __main__:
      
      _add_one(x)
          Adds one
      

      but it assumes that the final user is aware of the distinction between UDF and built-in functions.

      Proposed

      Copy input functions docstring to the UDF object or function wrapper.

      In [1]: from pyspark.sql.types import IntegerType
      
      In [2]: from pyspark.sql.functions import udf
      
      In [3]: def _add_one(x):
              """Adds one"""
              if x is not None:
                      return x + 1
         ...:    
      
      In [4]: add_one = udf(_add_one, IntegerType())
      
      In [5]: ?add_one
      Signature: add_one(x)
      Docstring:
      Adds one
      
      SQL Type: IntegerType
      File:      ~/Workspace/spark/<ipython-input-3-d2d8e4c530ac>
      Type:      function
      
      In [6]: help(add_one)
      
      
      Help on function _add_one in module __main__:
      
      _add_one(x)
          Adds one
          
          SQL Type: IntegerType
      (END)
      
      

      Attachments

        Issue Links

          Activity

            People

              zero323 Maciej Szymkiewicz
              zero323 Maciej Szymkiewicz
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: