[KYLIN-3729] CLUSTER BY CAST(field AS STRING) will accelerate base cuboid build with UHC global dict - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: v2.5.2
Fix Version/s: v2.6.0
Component/s: Job Engine
Labels:
None

Description

As we know global dict is a sliced appendTrieTree using cache-loader , so if we convert values to ids using global dict, ordered values will help.

And now we can set kylin.source.hive.flat-table-cluster-by-dict-column = uhc column, to make source data CLUSTER BY uhc-column, this get better.

But the appendTrieTree is order by string, so we can CLUSTER BY CAST(uhc-column AS STRING), to optimize most.

We can see the hdfs bytes read (most is global dict) reduce to 30%

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2018-12-19-12-01-20-430.png
19/Dec/18 04:01
42 kB
Fangyuan Deng
image-2018-12-19-12-02-08-913.png
19/Dec/18 04:02
48 kB
Fangyuan Deng
KYLIN-3729.1.patch
26/Dec/18 04:03
1 kB
Fangyuan Deng

Activity

People

Assignee:: Fangyuan Deng

Reporter:: Fangyuan Deng

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 19/Dec/18 03:53

Updated:: 27/Jan/19 14:57

Resolved:: 26/Dec/18 06:13