[KYLIN-1323] Improve performance of converting data to hfile - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: v1.2
Fix Version/s: v1.3.0, v1.4.0, v1.5.2
Component/s: Job Engine
Labels:
None

Description

Supposed that we got 100GB data after cuboid building, and with setting that 10GB per region. For now, 10 split keys was calculated, and 10 region created, 10 reducer used in ‘convert to hfile’ step.

With optimization, we could calculate 100 (or more) split keys, and use all them in ‘covert to file’ step, but sampled 10 keys in them to create regions. The result is still 10 region created, but 100 reducer used in ‘convert to file’ step. Of course, the hfile created is also 100, and load 10 files per region. That’s should be fine, doesn’t affect the query performance dramatically.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

KYLIN-1323-1.x-staging.2.patch
22/Feb/16 08:14
24 kB
Yerui Sun
KYLIN-1323-1.x-staging.patch
16/Feb/16 07:42
18 kB
Yerui Sun
KYLIN-1323-2.x-staging.2.patch
28/Feb/16 13:09
49 kB
Yerui Sun

Activity

People

Assignee:: Yerui Sun

Reporter:: Yerui Sun

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 15/Jan/16 10:25

Updated:: 26/May/16 09:16

Resolved:: 11/May/16 06:33