[IMPALA-9792] Split Kudu scan ranges into smaller chunks for greater paralellelism - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Backend
Labels:
- kudu
- multithreading

Epic Link:
Multithreading upgrade path for large clusters
Epic Color:
ghx-label-8

Description

We currently use one thread to scan each tablet, which may underparallelise queries in many cases. Kudu added an API in ~~KUDU-2437~~ and KUDU-2670 to split tokens at a finer granularity.

See
https://github.com/apache/kudu/commit/22a6faa44364dec3a171ec79c15b814ad9277d8f#diff-a4afa9dba99c7612b2cb9176134ff2b0

The major downside is that the planner has to do an extra RPC to a tserver for each tablet being scanned in order to figure out key range splits. Maybe we can tie this to mt_dop >= 2, or use some heuristics to avoid these RPCs for smaller tables.