[SPARK-13365] should coalesce do anything if coalescing to same number of partitions without shuffle - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.6.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- bulk-closed

Description

Currently if a user does a coalesce to the same number of partitions as already exist it spends a bunch of time doing stuff when it seems like it shouldn't do anything.

for instance I have an RDD with 100 partitions if I run coalesce(100) it seems like it should skip any computation since it already has 100 partitions. One case I've seen this is actually when users do coalesce(1000) without the shuffle which really turns into a coalesce(100).

I'm presenting this as a question as I'm not sure if there are use cases I haven't thought of where this would break.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Thomas Graves

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Feb/16 21:41

Updated:: 21/May/19 04:12

Resolved:: 21/May/19 04:12