[IMPALA-3701] Evaluate compressing Runtime filters to save coordinator network bandwidth - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: Impala 2.5.0
Fix Version/s: None
Component/s: Distributed Exec
Labels:
- runtime-filters
- scalability

Target Version:

Product Backlog

Description

When running complex queries on large clusters with lots of runtime filters the coordinator quickly becomes network bound due to the extra incoming and outgoing traffic for runtime filters, once the coordinator becomes network bound all other fragments in the cluster are negatively affected as they get blocked on shuffling/broadcasting data to the coordinator node.

This bottleneck was identified when running large scale tests on EC2 nodes with less than ideal network throughput.

In attached png is aggregate network throughput across the 32 nodes in the cluster with the coordinator in red.

Compression should alleviate this bottleneck but we should consider other solutions

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

query17.sql.2.out
09/Jun/16 05:57
1.26 MB
Mostafa Mokhtar
image-2016-06-08-22-55-36-966.png
09/Jun/16 05:55
43 kB
Mostafa Mokhtar

Issue Links

is related to

IMPALA-3825 Distribute runtime filter aggregation across cluster

Resolved

IMPALA-3610 Track non-RPC memory from global runtime filters on the coordinator

Resolved

Activity

People

Assignee:: Henry Robinson

Reporter:: Mostafa Mokhtar

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 09/Jun/16 06:01

Updated:: 13/Jun/20 00:09

Resolved:: 13/Jun/20 00:09