[KYLIN-3491] Improve the cube building process when using global dictionary - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: v2.5.0
Component/s: Job Engine
Labels:
None

Description

By current cubing process, if the global dictionary is very large, since the raw data records are unsorted, it's hard to encode raw values into ids for the input of bitmap due to frequent swap of the dictionary slices. We need a refined process. The idea is as follows:

for each source data block, there will be a mapper generating the distinct values & sort them
encode the sorted distinct values and generate a shrunken dict for each source data block.
when building base cuboid, use the shrunken dict for each source data block for encoding.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

APACHE-KYLIN-3491.patch
15/Aug/18 09:47
53 kB
Zhong Yanghong
APACHE-KYLIN-3491-with-fix.patch
25/Aug/18 00:55
53 kB
Zhong Yanghong

Issue Links

is a parent of

KYLIN-4940 Implement the step of "Extract Dictionary from Global Dictionary" for spark cubing engine

Closed

relates to

KYLIN-3424 Missing invoke addCubingGarbageCollectionSteps in the cleanup step for HBaseMROutput2Transition

Closed

Activity

People

Assignee:: Zhong Yanghong

Reporter:: Zhong Yanghong

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 10/Aug/18 09:04

Updated:: 19/Mar/21 09:15

Resolved:: 03/Sep/18 10:52