Details
-
Wish
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.3.3
-
None
-
None
Description
Method toLocalIterator fetches the partitions to the driver one by one. However, as far as I can see, any required computation for the yet-to-be-fetched-partitions is not kicked off until it is fetched. Effectively only one partition is being computed at the same time.
Desired behavior: immediately start calculation of all partitions while retaining the download-a-partition at a time behavior.
Attachments
Issue Links
- duplicates
-
SPARK-27659 Allow PySpark toLocalIterator to prefetch data
- Resolved
- is duplicated by
-
SPARK-29852 Implement parallel preemptive RDD.toLocalIterator and Dataset.toLocalIterator
- Resolved