Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19159 PySpark UDF API improvements
  3. SPARK-19165

UserDefinedFunction should verify call arguments and provide readable exception in case of mismatch

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.6.0, 2.0.0, 2.1.0, 2.2.0
    • 2.3.0
    • PySpark, SQL
    • None

    Description

      Invalid arguments to UDF call fail with a bit cryptic Py4J errors:

      In [5]: g = udf(lambda x: x)
      
      In [6]: df.select(f([]))
      
      ---------------------------------------------------------------------------
      Py4JError                                 Traceback (most recent call last)
      <ipython-input-10-5fb48a5d66d2> in <module>()
      ----> 1 df.select(f([]))
      ....
      Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.col. Trace:
      py4j.Py4JException: Method col([class java.util.ArrayList]) does not exist
      	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
      	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
      	at py4j.Gateway.invoke(Gateway.java:274)
      	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
      	at py4j.commands.CallCommand.execute(CallCommand.java:79)
      	at py4j.GatewayConnection.run(GatewayConnection.java:214)
      	at java.lang.Thread.run(Thread.java:745)
      
      

      It is pretty easy to perform basic input validation:

      In [8]: f = udf(lambda x: x)
      
      In [9]: f(1)
      ---------------------------------------------------------------------------
      TypeError                                 Traceback (most recent call last)
      
      ...
      TypeError: All arguments should be Columns or strings representing column names. Got 1 of type <class 'int'>
      
      

      This can be further extended to check for expected number of arguments or even, with some type of annotations, SQL types.

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            zero323 Maciej Szymkiewicz
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: