[SPARK-4630] Dynamically determine optimal number of partitions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

Partition sizes play a big part in how fast stages execute during a Spark job. There is a direct relationship between the size of partitions to the number of tasks - larger partitions, fewer tasks. For better performance, Spark has a sweet spot for how large partitions should be that get executed by a task. If partitions are too small, then the user pays a disproportionate cost in scheduling overhead. If the partitions are too large, then task execution slows down due to gc pressure and spilling to disk.

To increase performance of jobs, users often hand optimize the number(size) of partitions that the next stage gets. Factors that come into play are:
Incoming partition sizes from previous stage
number of available executors
available memory per executor (taking into account spark.shuffle.memoryFraction)

Spark has access to this data and so should be able to automatically do the partition sizing for the user. This feature can be turned off/on with a configuration option.

To make this happen, we propose modifying the DAGScheduler to take into account partition sizes upon stage completion. Before scheduling the next stage, the scheduler can examine the sizes of the partitions and determine the appropriate number tasks to create. Since this change requires non-trivial modifications to the DAGScheduler, a detailed design doc will be attached before proceeding with the work.

Attachments

Issue Links

duplicates

SPARK-9850 Adaptive execution in Spark

Open

is related to

SPARK-9872 Allow passing of 'numPartitions' to DataFrame joins

Closed

relates to

SPARK-6377 Set the number of map output partitions for Exchange operator automatically based on the size of input tables and the reduce-side operation.

Resolved

links to

[Github] Pull Request #4070 (lianhuiwang)

Activity

People

Assignee:: Unassigned

Reporter:: Kostas Sakellis

Votes:: 12 Vote for this issue

Watchers:: 44 Start watching this issue

Dates

Created:: 26/Nov/14 22:10

Updated:: 11/Oct/16 03:51

Resolved:: 11/Oct/16 03:51