[IMPALA-9146] Limit the size of the broadcast input on build side of hash join - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: Impala 3.3.0
Fix Version/s: Impala 3.4.0
Component/s: None
Labels:
None

Epic Color:
ghx-label-13

Description

Since broadcast based hash joins are often chosen, we sometimes see very large tables being broadcast, with sizes that are larger than the destination executor's total memory. This could potentially happen if the cluster membership is not accurately known and the planner's cost computation of the broadcastCost vs partitionCost happens to favor the broadcast distribution. This causes spilling and severely affects performance. Although the DistributedPlanner does a mem_limit check before picking broadcast, the mem_limit is not an accurate reflection since it is assigned during admission control (See IMPALA-988).

Given this scenario, as a safety check it is better to have to an explicit configurable limit for the size of the broadcast input and set it to a reasonable default. The 'reasonable' default can be chosen based on analysis of existing benchmark queries and representative workloads where Impala is currently used.

Attachments

Activity

People

Assignee:: Aman Sinha

Reporter:: Aman Sinha

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Nov/19 00:33

Updated:: 10/Dec/19 01:35

Resolved:: 10/Dec/19 01:35