Details
-
Task
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
1.5.0
-
None
-
None
Description
For the some user of using C++ code in their project, they can't call CarbonData interface and integrate CarbonData into their C++ project. So we plan to provide C++ interface for C++ user to integrate carbon, including read and write CarbonData. It's will more convenient for they.
We plan to design and develop as following:
1. Provide CarbonReader for SDK, it can read carbon data in C++ language
##features/interfaces
1.1. create CarbonReader
1.2. hasNext()
1.3. readNextRow()
1.4. close()
1.5. support OBS(AK/SK/Endpoint)
1.6 support batch read(withBatch,readNextBatchRow)
1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader)
1.8 projection
##support data types:
String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
Array<String> in carbonrecordreader, not support in vectorreader
byte=>support in java RowUtil, not in C++ carbon reader
-
- Schema and data
Create table tbl_email_form_to_for_XX(
Event_Time Timestamp,
Ingestion_Time Timestamp,
From_Email String,
To_Email String,
From_To_type String,
Event_ID String
) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
ETL 6 columns from 18 columns table
- Schema and data
example data:
from_email_36550_phillip.allen@enron.com to_email_36550_stagecoachmama@hotmail.com from_to <29528303.1075855666657.JavaMail.evans@thyme> 1538015497000000 9755149200000
2. the performance should be reach X millions records/s/node
3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
##features/interfaces
3.1. create CarbonWriter, including create schema(withCsvInput),set outputPath, and build,
3.2. write()
3.3. close()
3.4. support OBS(AK/SK/Endpoint)(withHadoopConf)
3.5. writtenBy
3.6. support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)
##Data types:
Carbon need support base data types, including string, float, double, int, long, date, timestamp, bool, array<String>.
For other, we can convert:
char array => carbon string
Enum => Carbon string
set and list => carbon array<String>
##performance
Writing Performance is not required now
4. read schema function
readSchema
getVersionDetails =>TODO
5. support carbonproperties
5.1 addProperty
5.2 getProperty
6.TODO:
6.1.getVersionDetails. => to be review
6.2.updated SDK/CSDK reader doc => to be review
6.3.support byte(write read)
6.4.support long string columns
6.5.support sortBy=> to be review
6.6.support withCsvInput(Schema schema); create schema(JAVA)
6.7. optimize the write doc => to be review
/**
- Create a
{@link CarbonWriterBuilder}
to build a
{@link CarbonWriter}*/
{ return new CarbonWriterBuilder(); }
public static CarbonWriterBuilder builder()
Attachments
Issue Links
- is a parent of
-
CARBONDATA-2996 readSchemaInIndexFile can't read schema by folder path
- Resolved
-
CARBONDATA-3108 Jvm will crash when CarbonRow use wrong index number in CSDK
- Resolved
-
CARBONDATA-3216 There are some bugs in CSDK
- Resolved
1.
|
Provide CarbonReader C++ interface for SDK | Resolved | Bo Xu |
|
||||||||
2.
|
Support read primitive data type in CSDK | Resolved | Bo Xu |
|
||||||||
3.
|
Support read schema from index file and data file in CSDK | Resolved | Bo Xu |
|
||||||||
4.
|
support read schema from S3 | Resolved | Bo Xu |
|
||||||||
5.
|
Provide C++ interface for writing carbon data | Resolved | Bo Xu |
|
||||||||
6.
|
Suppor read batch row in CSDK | Resolved | Bo Xu |
|
||||||||
7.
|
Add test framework for CSDK | Open | Babulal |
|
||||||||
8.
|
Handle exception in CSDK | Resolved | Bo Xu | |||||||||
9.
|
Support set carbon property in CSDK | Resolved | Bo Xu |
|
||||||||
10.
|
Support other interface in carbon writer of C++ SDK | Resolved | Bo Xu |
|
||||||||
11.
|
Improve the C++ SDK read performance by merging column in JNI | Open | Bo Xu |
|
||||||||
12.
|
Optimize the documentation of SDK/CSDK | Resolved | Bo Xu |
|
||||||||
13.
|
Support folder path in getVersionDetails and support getVersionDetails in CSDK | Open | Bo Xu |
|
||||||||
14.
|
Support get length from CarbonRow in CSDK | Open | Bo Xu |
|