Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
v2.5.2
-
None
Description
As we know global dict is a sliced appendTrieTree using cache-loader , so if we convert values to ids using global dict, ordered values will help.
And now we can set kylin.source.hive.flat-table-cluster-by-dict-column = uhc column, to make source data CLUSTER BY uhc-column, this get better.
But the appendTrieTree is order by string, so we can CLUSTER BY CAST(uhc-column AS STRING), to optimize most.
We can see the hdfs bytes read (most is global dict) reduce to 30%