[SPARK-18003] RDD zipWithIndex generate wrong result when one partition contains more than 2147483647 records. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.2, 2.1.0
Component/s: Spark Core
Labels:
- correctness

Description

RDD zipWithIndex generate wrong result when one partition contains more than Int.MaxValue records.

when RDD contains a partition with more than 2147483647 records,
error occurs.
for example, if partition-0 has more than 2147483647 records, the index became:
0,1, ..., 2147483647, -2147483648, -2147483647, -2147483646 ....

when we do some operation such as repartition or coalesce, it is possible to generate big partition, so this bug should be fixed.

Attachments

Issue Links

links to

[Github] Pull Request #15550 (WeichenXu123)

Activity

People

Assignee:: Weichen Xu

Reporter:: Weichen Xu

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Oct/16 05:08

Updated:: 20/Oct/16 06:41

Resolved:: 20/Oct/16 06:41

Time Tracking

Estimated:

24h

Remaining:

24h

Logged:

Not Specified