XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0, 2.0.0, 2.1.0, 2.2.0
    • 2.2.0
    • PySpark, SQL

    Description

      Right now there are a few ways we can create UDF:

      • With standalone function:
        def _add_one(x):
            """Adds one"""
            if x is not None:
                 return x + 1            
        
        add_one = udf(_add_one, IntegerType())
        

        This allows for full control flow, including exception handling, but duplicates variables.

      • With `lambda` expression:
        add_one = udf(lambda x: x + 1 if x is not None else None, IntegerType())
        

        No variable duplication but only pure expressions.

      • Using nested functions with immediate call:
        def add_one(c):
            def add_one_(x):
                if x is not None:
                    return x + 1
            return udf(add_one_, IntegerType())(c)
        

        Quite verbose but enables full control flow and clearly indicates expected number of arguments.

      • Using `udf` functions as a decorator:
        @udf
        def add_one(x):
            """Adds one"""
            if x is not None:
                return x + 1
        

        Possible but only with default `returnType` (or curried `@partial(udf, returnType=IntegerType())`).

      Proposed

      Add `udf` decorator which can be used as follows:

      from pyspark.sql.decorators import udf
      
      @udf(IntegerType())
      def add_one(x):
          """Adds one"""
          if x is not None:
              return x + 1
      

      or

      @udf()
      def strip(x):
          """Strips String"""
          if x is not None:
              return x.strip()
      

      Attachments

        Activity

          People

            zero323 Maciej Szymkiewicz
            zero323 Maciej Szymkiewicz
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: