[IGNITE-6089] SQL: Improve query parallelism architecture - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.1
Fix Version/s: None
Component/s: sql
Labels:
- performance
- sql-engine

Description

Currently query parallelism implement with static split of all indexes (including PK) for cache. This approach has several major disadvantages:
1) It improves scans, but slows down index and range lookups
2) Tables with different DOP cannot be used in the same query

We need to fix that. Proposed plan:
1) No more index splits, ever - there is one and only one index always
2) Use preliminary execution plan, statistics (~~IGNITE-6079~~), CPU cores count and CPU load to estimate whether query will benefit from parallelism.
3) if yes - split node-s single map query into several independent pieces.

Splitting can be achieved in one of the following ways:
1) Partition-based: e.g. if node owns partitions A, B, C and D, then we can split it to two queries - one over (A, B), another over (C, D). This could be useful for pure scans (e.g. DWH)
2) Histogram-based: e.g. if we have a query SELECT ... WHERE salary > 50, and we know salary distribution, we can split it into WHERE salary > 50 AND salary <= 200 and WHERE salary > 200

Attachments

Issue Links

depends upon

IGNITE-6079 SQL: implement base table statistics

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Vladimir Ozerov

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/Aug/17 14:53

Updated:: 21/Mar/18 12:58