[SPARK-2393] Simple cost estimation and auto selection of broadcast join - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: SQL
Labels:
None

Target Version/s:

1.1.0

Description

Spark SQL should support the common optimization known as cost estimations. For example, each logical operator should be able to estimate its cardinality, based on the estimates from its children.

As a first step, the framework to support doing so should be added, which might include the interface for the aforementioned cardinality estimation, and/or some other metrics.

As the first proof of concept usage of this optimization, a simple optimization strategy for certain equi-joins should be added: namely, if one side of a qualifying join has a small estimated physical size (smaller than some threshold), the planner should use a broadcast join physical plan to execute the join, broadcasting the small side and streaming through the bigger side. Similar concept exists in Hive and is explained here.

Attachments

Issue Links

is depended upon by

SPARK-2436 Apply size-based optimization to planning BroadcastNestedLoopJoin

Resolved

relates to

SPARK-46516 autoBroadcastJoinThreshold compared to project of a plan not a relation size

Open

links to

[Github] Pull Request #1238 (concretevitamin)

Activity

People

Assignee:: Zongheng Yang

Reporter:: Michael Armbrust

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 07/Jul/14 21:47

Updated:: 27/Dec/23 14:18

Resolved:: 29/Jul/14 22:33