Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43797 Python User-defined Table Functions
  3. SPARK-45401

Add a new method `cleanup` in the UDTF interface

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0, 4.0.0
    • 4.0.0
    • PySpark

    Description

      Currently, the terminate method of a UDTF is always executed, regardless of whether the eval method calls are successful. This is problematic. We should execute terminate only when all eval calls succeed.

      But what if users wish to perform cleanup actions during UDTF execution, such as closing connections? One option is for users to embed a try...except logic within the eval call:

      def eval(self, row: Any):
        try:
          run_code()
        except Exception:
          clean_up()

      However, running this try...except block for every eval call can be expensive to run, potentially affecting the performance of UDTFs.

      To tackle this, we can introduce a new method in the UDTF interface that will be called regardless of the outcome. The logic would look like:

      try:
        eval()
        terminate()
      finally:
        cleanup()

      Attachments

        Issue Links

          Activity

            People

              allisonwang-db Allison Wang
              allisonwang-db Allison Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: