Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43797

Python User-defined Table Functions

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.5.0, 4.0.0
    • None
    • PySpark
    • None

    Description

      This is an umbrella ticket to support Python user-defined table functions.

      Attachments

        Issue Links

          1.
          Initial support for Python UDTFs Sub-task Resolved Allison Wang
          2.
          Support arrow-optimized Python UDTFs Sub-task Resolved Allison Wang
          3.
          Support Python UDTFs in Spark Connect Sub-task Resolved Allison Wang
          4.
          Support non-deterministic Python UDTFs Sub-task Resolved Allison Wang
          5.
          Support Python UDTFs with empty return values Sub-task Resolved Allison Wang
          6.
          Improve error messages for Python UDTFs with wrong number of outputs Sub-task Resolved Allison Wang
          7.
          Improve error messages for Python UDTF arrow type casts Sub-task Resolved Allison Wang
          8.
          Improve error messages for Python UDTF returning non iterable Sub-task Resolved Allison Wang
          9.
          Fix AssertionError when converting UDTF output to a complex type Sub-task Resolved Takuya Ueshin
          10.
          Improve error messages for creating Python UDTFs with pickling errors Sub-task Resolved Allison Wang
          11.
          Disable arrow optimization by default for Python UDTFs Sub-task Resolved Allison Wang
          12.
          Improve error messages for regular Python UDTFs that return non-tuple values Sub-task Resolved Allison Wang
          13.
          Include the name of the UDTF in the error messages generated during the function execution Sub-task Open Unassigned
          14.
          Support profiler for Python UDTFs Sub-task Open Unassigned
          15.
          Refactor PythonUDTFRunner to send its return type separately Sub-task Resolved Takuya Ueshin
          16.
          Support for UDTF to analyze in Python Sub-task Resolved Takuya Ueshin
          17.
          Support Python UDTFs with empty schema Sub-task Resolved Takuya Ueshin
          18.
          Query planning to support PARTITION BY and ORDER BY clause for table arguments Sub-task Resolved Daniel
          19.
          Add support for accumulator, broadcast, and Spark files in Python UDTF's analyze. Sub-task Resolved Takuya Ueshin
          20.
          Set up memory limits for analyze in Python. Sub-task Resolved Takuya Ueshin
          21.
          Add user guide for Python UDTFs Sub-task Resolved Allison Wang
          22.
          Improve the documentation for TABLE input arguments for UDTFs Sub-task Resolved Daniel
          23.
          Query execution to support PARTITION BY and ORDER BY clause for table arguments Sub-task Resolved Daniel
          24.
          Support named arguments in Python UDTF Sub-task Resolved Takuya Ueshin
          25.
          Cache the pandas converter for Python UDTFs Sub-task Resolved Allison Wang
          26.
          Make Python UDTFs by default non-deterministic Sub-task Resolved Allison Wang
          27.
          Add SQL query test suites for Python UDTFs Sub-task Resolved Allison Wang
          28.
          Refactor Arrow Python UDTF Sub-task Resolved Takuya Ueshin
          29.
          Improve Python UDTF arrow serializer performance Sub-task Open Michael Zhang
          30.
          Add API in 'analyze' method to return partitioning/ordering expressions Sub-task Resolved Daniel
          31.
          Project out PARTITION BY expressions before 'eval' method consumes input rows Sub-task Resolved Daniel
          32.
          Add a new method `cleanup` in the UDTF interface Sub-task Resolved Allison Wang
          33.
          Add API for 'analyze' method to return a buffer to be consumed on each class creation Sub-task Resolved Daniel
          34.
          Refactor analyzeInPython function to make it reusable Sub-task Resolved Allison Wang
          35.
          Return useful error message if UDTF returns None for non-nullable column Sub-task Resolved Daniel
          36.
          Return specific error messages if UDTF 'analyze' method accepts or returns wrong values Sub-task Resolved Daniel
          37.
          Create API to stop consuming rows from the input table Sub-task Resolved Daniel
          38.
          Update API for 'analyze' partitioning/ordering columns to support general expressions Sub-task Resolved Daniel
          39.
          Create API to acquire execution memory for 'eval' and 'terminate' methods Sub-task Closed Unassigned
          40.
          Create API for 'analyze' method to indicate subset of input table columns to select Sub-task Resolved Daniel
          41.
          Enforce that 'AnalyzeResult' 'orderBy' field is a list of pyspark.sql.functions.OrderingColumn Sub-task Resolved Daniel
          42.
          Create API for 'analyze' method to send input column(s) to output table unchanged Sub-task Resolved Unassigned
          43.
          Create API for 'analyze' method to differentiate constant NULL arguments and other types of arguments Sub-task Resolved Daniel
          44.
          Support running Python UDTF 'analyze' method from Spark executors Sub-task Resolved Unassigned
          45.
          Analyzer bug with multiple ORDER BY items for input table argument Sub-task Resolved Daniel
          46.
          [Bug] Partition indices are incorrect when UDTF analyze() uses both select and partitionColumns Sub-task Resolved Daniel

          Activity

            People

              Unassigned Unassigned
              allisonwang-db Allison Wang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: