[SPARK-1817] RDD zip erroneous when partitions do not divide RDD count - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.0, 1.0.0
Fix Version/s: 1.1.0
Component/s: Spark Core
Labels:
None

Description

Example:

scala> sc.parallelize(1L to 2L,4).zip(sc.parallelize(11 to 12,4)).collect
res1: Array[(Long, Int)] = Array((2,11))

But more generally, it's whenever the number of partitions does not evenly divide the total number of elements in the RDD.

See https://groups.google.com/forum/#!msg/spark-users/demrmjHFnoc/Ek3ijiXHr2MJ

Attachments

Issue Links

depends upon

SPARK-1837 NumericRange should be partitioned in the same way as other sequences

Resolved

Activity

People

Assignee:: Kan Zhang

Reporter:: Michael Malak

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 13/May/14 05:26

Updated:: 04/Jun/14 16:16

Resolved:: 04/Jun/14 05:47