Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Inspired by KYLIN-1656, Kylin can distribute the source data by certain columns when creating the flat hive table; Then the data assigned to a mapper will have more similarity, more aggregation can happen at mapper side, and then less shuffle and reduce is needed.
Columns can be used for the distribution includes: ultra high cardinality column, mandantory column, partition date/time column, etc.
Attachments
Issue Links
- relates to
-
KYLIN-1656 Improve performance of MRv2 engine by making each mapper handles a configured number of records
- Closed