Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
By current cubing process, if the global dictionary is very large, since the raw data records are unsorted, it's hard to encode raw values into ids for the input of bitmap due to frequent swap of the dictionary slices. We need a refined process. The idea is as follows:
- for each source data block, there will be a mapper generating the distinct values & sort them
- encode the sorted distinct values and generate a shrunken dict for each source data block.
- when building base cuboid, use the shrunken dict for each source data block for encoding.
Attachments
Attachments
Issue Links
- is a parent of
-
KYLIN-4940 Implement the step of "Extract Dictionary from Global Dictionary" for spark cubing engine
- Closed
- relates to
-
KYLIN-3424 Missing invoke addCubingGarbageCollectionSteps in the cleanup step for HBaseMROutput2Transition
- Closed