Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41279 Feature parity: DataFrame API in Spark Connect
  3. SPARK-40645

Throw exception for Collect() and recommend to use toPandas()

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Connect
    • None

    Description

      Current connect `Collect()` return Pandas DataFrame, which does not match with PySpark DataFrame API: https://github.com/apache/spark/blob/ceb8527413288b4d5c54d3afd76d00c9e26817a1/python/pyspark/sql/connect/data_frame.py#L227.

      The underlying implementation has been generating Pandas DataFrame though. In this case, we can choose to use to `toPandas()` and throw exception for `Collect()`.

      Attachments

        Activity

          People

            amaliujia Rui Wang
            amaliujia Rui Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: