Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19159 PySpark UDF API improvements
  3. SPARK-19162

UserDefinedFunction constructor should verify that func is callable

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.6.0, 2.0.0, 2.1.0, 2.2.0
    • 2.2.0
    • PySpark, SQL
    • None

    Description

      Current state

      Right now `UserDefinedFunctions` don't perform any input type validation. It will accept non-callable objects just to fail with hard to understand traceback:

      In [1]: from pyspark.sql.functions import udf
      
      In [2]: df = spark.range(0, 1)
      
      In [3]: f = udf(None)
      
      In [4]: df.select(f()).first()
      17/01/07 19:30:50 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7)
      
      ...
      Py4JJavaError: An error occurred while calling o51.collectToPython.
      ...
      TypeError: 'NoneType' object is not callable
      ...
      
      

      Proposed

      Apply basic validation for func argument:

      In [7]: udf(None)
      
      ---------------------------------------------------------------------------
      TypeError                                 Traceback (most recent call last)
      <ipython-input-7-0765fbe657a9> in <module>()
      ----> 1 udf(None)
      ...
      TypeError: func should be a callable object (a function or an instance of a class with __call__). Got <class 'NoneType'>
      
      

      Attachments

        Activity

          People

            zero323 Maciej Szymkiewicz
            zero323 Maciej Szymkiewicz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: