[SPARK-40645] Throw exception for Collect() and recommend to use toPandas() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 3.4.0
Component/s: Connect
Labels:
None

Description

Current connect `Collect()` return Pandas DataFrame, which does not match with PySpark DataFrame API: https://github.com/apache/spark/blob/ceb8527413288b4d5c54d3afd76d00c9e26817a1/python/pyspark/sql/connect/data_frame.py#L227.

The underlying implementation has been generating Pandas DataFrame though. In this case, we can choose to use to `toPandas()` and throw exception for `Collect()`.

Attachments

Issue Links

links to

[Github] Pull Request #38089 (amaliujia)

Activity

People

Assignee:: Rui Wang

Reporter:: Rui Wang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Oct/22 03:18

Updated:: 12/Dec/22 18:10

Resolved:: 05/Oct/22 02:23