Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.1
-
None
Description
Currently query parallelism implement with static split of all indexes (including PK) for cache. This approach has several major disadvantages:
1) It improves scans, but slows down index and range lookups
2) Tables with different DOP cannot be used in the same query
We need to fix that. Proposed plan:
1) No more index splits, ever - there is one and only one index always
2) Use preliminary execution plan, statistics (IGNITE-6079), CPU cores count and CPU load to estimate whether query will benefit from parallelism.
3) if yes - split node-s single map query into several independent pieces.
Splitting can be achieved in one of the following ways:
1) Partition-based: e.g. if node owns partitions A, B, C and D, then we can split it to two queries - one over (A, B), another over (C, D). This could be useful for pure scans (e.g. DWH)
2) Histogram-based: e.g. if we have a query SELECT ... WHERE salary > 50, and we know salary distribution, we can split it into WHERE salary > 50 AND salary <= 200 and WHERE salary > 200
Attachments
Issue Links
- depends upon
-
IGNITE-6079 SQL: implement base table statistics
- Closed