Description
This aims to ensure full compatibility between PySpark and Spark Connect by thoroughly testing and validating that all functionalities in PySpark work seamlessly with Spark Connect.
The initial work includes the creation of the test_connect_compatibility.py test suite, which validates the signature compatibility for core components such as DataFrame, Column, and SparkSession APIs. This test suite also includes checks for missing APIs and properties that need to be supported by Spark Connect.
Key goals for this project:
- Ensure that all PySpark APIs are fully functional in Spark Connect.
- Identify discrepancies in API signatures between PySpark and Spark Connect.
- Verify missing APIs and properties, and add necessary functionality to Spark Connect.
- Create comprehensive tests to prevent regressions and ensure long-term compatibility.
Further work will involve extending the test coverage to all critical PySpark modules and ensuring compatibility with Spark Connect in future releases.