[KYLIN-2243] TopN memory estimation is inaccurate in some cases - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: v2.0.0
Component/s: None
Labels:
None

Description

TopNCounterSerializer.maxLength() and TopNCounterSerializer.getStorageBytesEstimate() might be inaccurate, especially when there are multiple "group by" columns in one TopN measure and some uses long bytes encoding like "fixed_length:16"

The inaccurate estimation may cause memory issue when using in-mem cubing, and will cause the estimation on final cube size inaccurate.

The root cause is the data type like "top(100)" doesn't have the info of how long a key can be. So far it uses a default value 4 which is too small when the encoding is something like "fixed_length:16". The solution is extending the expression of data type to "top(100, 16)" to indicate that one key can be 16 bytes long. If the "scale" is absent, use 4 bytes as default.

Attachments

Activity

People

Assignee:: Shao Feng Shi

Reporter:: Shao Feng Shi

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Dec/16 03:40

Updated:: 25/Dec/18 02:16

Resolved:: 06/Mar/17 09:20