[SPARK-23005] Improve RDD.take on small number of partitions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.1
Fix Version/s: 2.3.0
Component/s: Spark Core
Labels:
None

Description

In current implementation of RDD.take, we overestimate the number of partitions we need to try by 50%:
`(1.5 * num * partsScanned / buf.size).toInt`
However, when the number is small, the result of `.toInt` is not what we want.
E.g, 2.9 will become 2, which should be 3.
Use math.Ceil fix the problem.

Attachments

Issue Links

links to

[Github] Pull Request #20200 (gengliangwang)

Activity

People

Assignee:: Gengliang Wang

Reporter:: Gengliang Wang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Jan/18 12:08

Updated:: 10/Jan/18 02:17

Resolved:: 10/Jan/18 02:17