Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28536

Reduce shuffle partitions in Python UDF tests in SQLQueryTestSuite

    XMLWordPrintableJSON

    Details

    • Type: Test
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      Currently, some SQL tests with Python UDFs takes long.

      In my local:

      [info] SQLQueryTestSuite:
      [info] - udf/udf-window.sql - Scala UDF (58 seconds, 558 milliseconds)
      [info] - udf/udf-window.sql - Regular Python UDF (58 seconds, 371 milliseconds)
      [info] - udf/udf-window.sql - Scalar Pandas UDF (1 minute, 8 seconds)

      and it takes up to 9 mins in Jenkins currently.

      In Python UDF tests, the number of shuffle partitions matter considerably in testing time because it requires to fork and communicate between external processes. We should reduce the number of it.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hyukjin.kwon Hyukjin Kwon
                Reporter:
                hyukjin.kwon Hyukjin Kwon
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: