Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
ghx-label-8
Description
We currently use one thread to scan each tablet, which may underparallelise queries in many cases. Kudu added an API in KUDU-2437 and KUDU-2670 to split tokens at a finer granularity.
The major downside is that the planner has to do an extra RPC to a tserver for each tablet being scanned in order to figure out key range splits. Maybe we can tie this to mt_dop >= 2, or use some heuristics to avoid these RPCs for smaller tables.
Attachments
Issue Links
- causes
-
IMPALA-10245 Test fails in TestKuduReadTokenSplit.test_kudu_scanner
- Resolved
- is related to
-
KUDU-2670 Splitting more tasks for spark job, and add more concurrent for scan operation
- Open
-
KUDU-2437 Split a tablet into primary key ranges by size
- Resolved
- relates to
-
IMPALA-9656 Dynamic intra-node load balancing for Kudu (and maybe HBase) scans.
- Open