[CARBONDATA-1014] Refactor on data loading and encoding override - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Refactor on current data loading flow to make it:
1. Use vectorized processing as early as possible
2. Make index build (sorting) CPU cache efficient, by using rowId and key column vector to sort
3. Open interface for format extension, including column encoding, compression, statistics.

Design doc will be posted in this JIRA soon.

Attachments

Sub-Tasks

1.

Refactor write step to use ColumnarPage

Resolved

Jacky Li

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 12h 40m

2.

Make sort step output ColumnPage

Open

Unassigned

3.

Add interface for column encoding and compression

Resolved

Jacky Li

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 12h 50m

4.

Make ColumnPage use Unsafe

Resolved

Jacky Li

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 13h 50m

5.

Add TablePage for data load process

Resolved

Jacky Li

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 2.5h

6.

change statistics to use exact type instead of Object

Resolved

Jacky Li

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 12h 10m

7.

Use Snappy.rawCompression on unsafe data

Resolved

Jacky Li

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 9h 20m

8.

Add SQL and dataframe option for encoding override

Open

Unassigned

9.

Change carbon data file definition for encoding override

Closed

Unassigned

10.

Add fixed length encoding for timestamp/date data type

Open

Unassigned

11.

Add encoding selection strategy for columns

Resolved

Unassigned

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 3h 20m

Activity

People

Assignee:: Unassigned

Reporter:: Jacky Li

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 03/May/17 08:38

Updated:: 23/May/18 12:00

Time Tracking

Estimated:

Not Specified

Remaining:

0h

Logged:

66h 40m

Include sub-tasks