[SPARK-1839] PySpark take() does not launch a Spark job when it has to - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 1.1.0
Component/s: PySpark
Labels:
None

Description

If you call take() or first() on a large FilteredRDD, the driver attempts to scan all partitions to find the first valid item. If the RDD is large this would fail or hang.

Attachments

Activity

People

Assignee:: Aaron Davidson

Reporter:: Hossein Falaki

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/May/14 23:45

Updated:: 31/May/14 20:06

Resolved:: 31/May/14 20:06