[IMPALA-1744] Improve query start-up efficiency. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: Impala 2.1, Impala 2.1.1
Fix Version/s: None
Component/s: None
Labels:
- performance
- ramp-up

Target Version:

Product Backlog

Description

The way we initiate remote work on behalf of a query can be made more efficient in several ways. The main problems with our current approach are:
1. We issue one RPC per fragment per host. This adds up for large clusters and.or complex queries. We also ship redundant data, e.g., every fragment gets its own copy of the DescriptorTable.
2. For each fragment, we start all its fragment instances in parallel, each RPC in its own thread. For large clusters, the number of threads can be substantial.

Suggested improvements:
1. We should group the fragment instances by host, and only issue one RPC per host to start up all fragments that host should run. On the remote side, care must be taken to start the fragments in the correct order. Initiating work in this way, will allow us to send only a single DescriptorTable that can be shared by all fragments instances on a particular host.
2. Use a thread pool for starting remote work to avoid excessive thread use.

Attachments

Issue Links

relates to

IMPALA-1599 Improve query start-up time with many fragment instances

Resolved

Activity

People

Assignee:: Henry Robinson

Reporter:: Alexander Behm

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 10/Feb/15 17:10

Updated:: 04/Jan/17 23:58

Resolved:: 04/Aug/15 17:46