[KYLIN-4941] Support encoding raw data to base cuboid column-by-column - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: v3.1.1
Fix Version/s: Future
Component/s: Job Engine
Labels:
None

Description

When building with spark engine, the first step is to encode hive table's row to base cuboid data.

The existing implementation is encoding row by row. If the cube has several dictionary encoded measures, it has to use all dictionaries at the same time to encode a single row. This causes heavy memory usage, and low cache hit ratio of dictionary cache.

We optimized this case by encoding column by column, and it did bring significant improvement over cubes with several high cardinality dictionaries-encoded measures.

We will refine the implementation based on KYLIN3.x and share it out.

Attachments

Activity

People

Assignee:: Shengjun Zheng

Reporter:: Shengjun Zheng

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 22/Mar/21 02:20

Updated:: 03/Nov/21 07:14