[IMPALA-9654] Intra-node execution skew increase with mt_dop - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Backend
Labels:
- multithreading
- performance

Epic Link:
Multithreading upgrade path for large clusters
Target Version:

Impala 4.3.0
Epic Color:
ghx-label-3

Description

We've seen significant amounts of execution skew (big gap between avg and max execution time for a scan node) with multithreading enabled on TPC-DS queries. We balance bytes well, but bytes of input files are often not correlated with the amount of work in the scan, or above the scan. Some causes are:

Dynamic partition pruning leading to different instance with variable numbers of input splits
Different amounts of rows being filtered out by predicates and row filters, leading to skew in rows returned from the plan.
Different amounts of compressibility
Files being written in different ways, e.g. different schema, different writer.

More dynamic load balancing can address all of this if scans pick up the next range when its pipeline has finished processing the rows from the previous range. I.e. with the threading model we can deal with time skew anywhere in the pipeline by balancing in the scan.

I think we can solve this for HDFS scans by lifting the ReaderContext up to the FragmentState (one per plan node) and making corresponding changes to the scan implementation. We would need to add a bit more machinery to support Kudu and HBase scans but I think a similar approach would work conceptually.

A more invasive (and probably expensive) solution is to do a local exchange above the scan node, e.g. a multi-producer multi-consumer queue.

Attachments

Issue Links

is duplicated by

IMPALA-9020 Move scan range queues to fragment-level and do dynamic scheduling within a node

Resolved

IMPALA-9637 Scan range load-balancing within backend

Resolved

Sub-Tasks

1.	Dynamic intra-node load balancing for HDFS scans		Resolved	Bikramjeet Vig
2.	Dynamic intra-node load balancing for Kudu (and maybe HBase) scans.		Open	Bikramjeet Vig

Activity

People

Assignee:: Bikramjeet Vig

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 14/Apr/20 16:13

Updated:: 23/Nov/22 12:24