[IMPALA-3825] Distribute runtime filter aggregation across cluster - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Impala 2.6.0
Fix Version/s: Impala 4.4.0
Component/s: Distributed Exec
Labels:
- runtime-filters
- scalability

Target Version:

Impala 4.4.0

Description

Runtime filters can be tens of MB or more, and incasting all filters from all shuffle joins to the coordinator can put a lot of memory pressure on that node. To alleviate this we should consider spreading out the aggregation operation across the cluster, so that a different node aggregates each runtime filter.

This still restricts aggregation to #runtime-filters nodes, which will usually be less than the cluster size. If we want to smooth that out further we could use tree-based aggregation, but let's measure the benefits of simply distributing the aggregation work first.

Attachments

Issue Links

causes

IMPALA-13040 SIGSEGV in QueryState::UpdateFilterFromRemote

Resolved

is depended upon by

IMPALA-6412 Memory issues with processing of incoming global runtime filters on coordinator

Resolved

is related to

IMPALA-8687 --rpc_use_loopback may not work for runtime filter RPCs

Resolved

IMPALA-7486 Admit less memory on dedicated coordinator for admission control purposes

Resolved

IMPALA-4457 Coordinator sends out runtime filters serially

Resolved

relates to

IMPALA-6144 Coordinator threads that publish RuntimeFilters continue to run after query failure/cancellation

Resolved

IMPALA-3701 Evaluate compressing Runtime filters to save coordinator network bandwidth

Resolved

(2 relates to)

Activity

People

Assignee:: Riza Suminto

Reporter:: Henry Robinson

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 05/Jul/16 20:06

Updated:: 26/Apr/24 09:32

Resolved:: 20/Dec/23 15:58