[SPARK-27025] Speed up toLocalIterator - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Wish
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.3.3
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

Method toLocalIterator fetches the partitions to the driver one by one. However, as far as I can see, any required computation for the yet-to-be-fetched-partitions is not kicked off until it is fetched. Effectively only one partition is being computed at the same time.

Desired behavior: immediately start calculation of all partitions while retaining the download-a-partition at a time behavior.

Attachments

Issue Links

duplicates

SPARK-27659 Allow PySpark toLocalIterator to prefetch data

Resolved

is duplicated by

SPARK-29852 Implement parallel preemptive RDD.toLocalIterator and Dataset.toLocalIterator

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Erik van Oosten

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Mar/19 16:09

Updated:: 12/Dec/22 18:11

Resolved:: 04/Dec/19 01:58