[IMPALA-2564] Introduce mechanism to limit query fan-out - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: Impala 2.2
Fix Version/s: None
Component/s: Distributed Exec
Labels:

Target Version:

Product Backlog

Description

The target use case is small queries on large clusters.

Today Impala schedules queries on all Impalad instances regardless of how much data each Impalad would read, this results in spreading the work too thin between nodes and exposes undesired scalability issues.

The proposal is to introduce a parameter that controls the Min/Max amount of data read by a single Impala instance.
The SimpleScheduler would combine several splits together in order to satisfy the Min size requirements for a single Impalad before moving on the to the next node.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Mostafa Mokhtar

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 17/Oct/15 00:08

Updated:: 28/Jun/18 21:27