[KYLIN-1677] Distribute source data by certain columns when creating flat table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: v1.5.3
Component/s: Job Engine
Labels:
None

Description

Inspired by ~~KYLIN-1656~~, Kylin can distribute the source data by certain columns when creating the flat hive table; Then the data assigned to a mapper will have more similarity, more aggregation can happen at mapper side, and then less shuffle and reduce is needed.

Columns can be used for the distribution includes: ultra high cardinality column, mandantory column, partition date/time column, etc.

Attachments

Issue Links

relates to

KYLIN-1656 Improve performance of MRv2 engine by making each mapper handles a configured number of records

Closed

Activity

People

Assignee:: Shao Feng Shi

Reporter:: Shao Feng Shi

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/May/16 07:06

Updated:: 28/Jul/16 06:48

Resolved:: 05/Jul/16 09:24