[CARBONDATA-2018] Optimization in reading/writing for sort temp row during data loading - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3.0
Fix Version/s: 1.4.0
Component/s: data-load
Labels:
None

Description

SCENARIO

Currently in carbondata data loading, during sort process step, records will be sorted partially and spilled to the disk. And then carbondata will read these records and do merge sort.

Since sort step is CPU-tense, during writing/reading these records, we can optimize the serialization/deserialization for these rows and reduce CPU consumption in parsing the rows.

This should enhance the data loading performance.

RESOLVE
We can pick up the un-sorted fields in the row and pack them as bytes array and skip paring them.

RESULT

I've tested it in my cluster and seen about 8% performance gained (74MB/s/Node -> 81MB/s/Node).

Attachments

Issue Links

links to

GitHub Pull Request #1792

Activity

People

Assignee:: Chuanyin Xu

Reporter:: Chuanyin Xu

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/Jan/18 07:18

Updated:: 13/Feb/18 02:36

Resolved:: 12/Feb/18 08:13

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

14h