Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49847

PySpark compatibility with Spark Connect

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0
    • None
    • Connect, PySpark
    • None

    Description

      This aims to ensure full compatibility between PySpark and Spark Connect by thoroughly testing and validating that all functionalities in PySpark work seamlessly with Spark Connect.

      The initial work includes the creation of the test_connect_compatibility.py test suite, which validates the signature compatibility for core components such as DataFrame, Column, and SparkSession APIs. This test suite also includes checks for missing APIs and properties that need to be supported by Spark Connect.

      Key goals for this project:

      • Ensure that all PySpark APIs are fully functional in Spark Connect.
      • Identify discrepancies in API signatures between PySpark and Spark Connect.
      • Verify missing APIs and properties, and add necessary functionality to Spark Connect.
      • Create comprehensive tests to prevent regressions and ensure long-term compatibility.

      Further work will involve extending the test coverage to all critical PySpark modules and ensuring compatibility with Spark Connect in future releases.

      Attachments

        There are no Sub-Tasks for this issue.

        Activity

          People

            itholic Haejoon Lee
            itholic Haejoon Lee
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: