[SPARK-17417] Fix sorting of part files while reconstructing RDD/partition from checkpointed files. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0, 2.0.0
Fix Version/s: 2.0.2, 2.1.0
Component/s: Spark Core
Labels:
None

Description

Spark currently assumes # of partitions to be less than 100000 and uses %05d padding.

If we exceed this no., the sort logic in ReliableCheckpointRDD gets messed up and fails. This is because of part-files are sorted and compared as strings.

This leads filename order to be part-10000, part-100000, ... instead of part-10000, part-10001, ..., part-100000 and while reconstructing the checkpointed RDD the job fails.

Possible solutions:

Bump the padding to allow more partitions or
Sort the part files extracting a sub-portion as string and then verify the RDD

Attachments

Issue Links

links to

[Github] Pull Request #15370 (dhruve)

Activity

People

Assignee:: Dhruve Ashar

Reporter:: Dhruve Ashar

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Sep/16 19:19

Updated:: 10/Oct/16 15:57

Resolved:: 10/Oct/16 15:57