[KYLIN-2826] Add basic support classes for cube planner algorithms - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Done
Affects Version/s: v2.1.0
Fix Version/s: v2.2.0
Component/s: None
Labels:
None

Description

Cube planner aims at recommending cost-effective cuboids. Currently we only consider scanned row count at query phase for the cost. The related formula is as follows:

cuboid cost = scanned row count on target cuboid * query probability

As we know the base cuboid is to be prebuilt absolutely. If only the base cuboid is prebuilt, for other cuboids, the target cuboid will be the base cuboid and the (scanned row count) is supposed to be large. When another cuboid is selected to be prebuilt, for its descendant cuboids including itself, it will be their target cuboid and the (scanned row count) is supposed to become smaller. Thus, this newly cuboid will bring some benefit. We employ BPUS (benefit per unit space) for cuboid selection. The related formula for the benefit of a cuboid is as follows:

cuboid benefit = total reduced cuboid cost) / (cuboid row count)

Cuboid selection is based on one basic rule:

RULE 1: Cuboids with more benefit will be preferred.

For a cube, cube planner can be used in two phases.

Phase one is for cube normal building.
To use cube planner for this phase, the cube should be empty or the building job is for refreshing the only one segment. In this phase, we regard each cuboid own the same (query probability) due to lack of query statistics.
Phase two is for cube optimization.
Currently cube optimization is manually triggered. (query probability) will be considered and its related query statistics are fetched from system cubes. Based on (query probability), it's possible for us to add missing cuboids without cuboid row count info. It's based on a rule, called mandatory rule.

RULE 2: A cuboid not pre-built should be added, if it's queried frequently and the average rollup row count from its pre-built parent cuboid is large.

From above introduction, we know cube planner is based on statistics, including cuboid row count, cuboid hit frequency, etc. Class CuboidStats is introduced to provide these info for related algorithm.

Here, we also define the interface CuboidRecommendAlgorithm for different kinds of cube planner algorithms. As we know, if there's no space limitation, to pre-build all of the cuboids will bring the most benefit. However, it's not feasible in real world. Then with space limitation, an interface is defined to recommend a set of high benefit cuboids.

List<Long> recommend(double expansionRate);

Here, the expansion rate is compared to the size of base cuboid.

Attachments

Activity

People

Assignee:: Zhong Yanghong

Reporter:: Zhong Yanghong

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 31/Aug/17 07:39

Updated:: 03/Nov/17 16:48

Resolved:: 25/Sep/17 01:10