XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0, 2.0.0, 2.1.0, 2.2.0
    • Fix Version/s: 2.2.0
    • Component/s: PySpark, SQL
    • Labels:

      Description

      Right now there are a few ways we can create UDF:

      • With standalone function:
        def _add_one(x):
            """Adds one"""
            if x is not None:
                 return x + 1            
        
        add_one = udf(_add_one, IntegerType())
        

        This allows for full control flow, including exception handling, but duplicates variables.

      • With `lambda` expression:
        add_one = udf(lambda x: x + 1 if x is not None else None, IntegerType())
        

        No variable duplication but only pure expressions.

      • Using nested functions with immediate call:
        def add_one(c):
            def add_one_(x):
                if x is not None:
                    return x + 1
            return udf(add_one_, IntegerType())(c)
        

        Quite verbose but enables full control flow and clearly indicates expected number of arguments.

      • Using `udf` functions as a decorator:
        @udf
        def add_one(x):
            """Adds one"""
            if x is not None:
                return x + 1
        

        Possible but only with default `returnType` (or curried `@partial(udf, returnType=IntegerType())`).

      Proposed

      Add `udf` decorator which can be used as follows:

      from pyspark.sql.decorators import udf
      
      @udf(IntegerType())
      def add_one(x):
          """Adds one"""
          if x is not None:
              return x + 1
      

      or

      @udf()
      def strip(x):
          """Strips String"""
          if x is not None:
              return x.strip()
      

        Attachments

          Activity

            People

            • Assignee:
              zero323 Maciej Szymkiewicz
              Reporter:
              zero323 Maciej Szymkiewicz
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: