Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.0.0
-
None
Description
As of Spark 1.0, RangePartitioner goes through data twice: once to compute the count and once to do sampling. As a result, to do sortByKey, Spark goes through data 3 times (once to count, once to sample, and once to sort).
RangePartitioner should go through data only once (remove the count step).
Attachments
Issue Links
- is related to
-
SPARK-1021 sortByKey() launches a cluster job when it shouldn't
- Resolved
- links to