Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Refactor on current data loading flow to make it:
1. Use vectorized processing as early as possible
2. Make index build (sorting) CPU cache efficient, by using rowId and key column vector to sort
3. Open interface for format extension, including column encoding, compression, statistics.
Design doc will be posted in this JIRA soon.
Attachments
1.
|
Refactor write step to use ColumnarPage | Resolved | Jacky Li |
|
||||||||
2.
|
Make sort step output ColumnPage | Open | Unassigned | |||||||||
3.
|
Add interface for column encoding and compression | Resolved | Jacky Li |
|
||||||||
4.
|
Make ColumnPage use Unsafe | Resolved | Jacky Li |
|
||||||||
5.
|
Add TablePage for data load process | Resolved | Jacky Li |
|
||||||||
6.
|
change statistics to use exact type instead of Object | Resolved | Jacky Li |
|
||||||||
7.
|
Use Snappy.rawCompression on unsafe data | Resolved | Jacky Li |
|
||||||||
8.
|
Add SQL and dataframe option for encoding override | Open | Unassigned | |||||||||
9.
|
Change carbon data file definition for encoding override | Closed | Unassigned | |||||||||
10.
|
Add fixed length encoding for timestamp/date data type | Open | Unassigned | |||||||||
11.
|
Add encoding selection strategy for columns | Resolved | Unassigned |
|