Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19159 PySpark UDF API improvements
  3. SPARK-19162

UserDefinedFunction constructor should verify that func is callable

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.6.0, 2.0.0, 2.1.0, 2.2.0
    • Fix Version/s: 2.2.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      Current state

      Right now `UserDefinedFunctions` don't perform any input type validation. It will accept non-callable objects just to fail with hard to understand traceback:

      In [1]: from pyspark.sql.functions import udf
      
      In [2]: df = spark.range(0, 1)
      
      In [3]: f = udf(None)
      
      In [4]: df.select(f()).first()
      17/01/07 19:30:50 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7)
      
      ...
      Py4JJavaError: An error occurred while calling o51.collectToPython.
      ...
      TypeError: 'NoneType' object is not callable
      ...
      
      

      Proposed

      Apply basic validation for func argument:

      In [7]: udf(None)
      
      ---------------------------------------------------------------------------
      TypeError                                 Traceback (most recent call last)
      <ipython-input-7-0765fbe657a9> in <module>()
      ----> 1 udf(None)
      ...
      TypeError: func should be a callable object (a function or an instance of a class with __call__). Got <class 'NoneType'>
      
      

        Attachments

          Activity

            People

            • Assignee:
              zero323 Maciej Szymkiewicz
              Reporter:
              zero323 Maciej Szymkiewicz
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: