Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-2951

CSDK: Provide C++ interface for SDK

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.5.0
    • Fix Version/s: NONE
    • Component/s: other
    • Labels:
      None

      Description

      For the some user of using C++ code in their project, they can't call CarbonData interface and integrate CarbonData into their C++ project. So we plan to provide C++ interface for C++ user to integrate carbon, including read and write CarbonData. It's will more convenient for they.

      We plan to design and develop as following:

      1. Provide CarbonReader for SDK, it can read carbon data in C++ language
      ##features/interfaces
      1.1. create CarbonReader
      1.2. hasNext()
      1.3. readNextRow()
      1.4. close()
      1.5. support OBS(AK/SK/Endpoint)
      1.6 support batch read(withBatch,readNextBatchRow)
      1.7 support vecor read(default) and carbonrecordreader (withRowRecordReader)
      1.8 projection

      ##support data types:
      String, Long,Varchar(string),Short,Int,Date(int),timestamp(long),boolean,Decimal(string),Float
      Array<String> in carbonrecordreader, not support in vectorreader
      byte=>support in java RowUtil, not in C++ carbon reader

        1. Schema and data
          Create table tbl_email_form_to_for_XX(
          Event_Time Timestamp,
          Ingestion_Time Timestamp,
          From_Email String,
          To_Email String,
          From_To_type String,
          Event_ID String
          ) using carbon options(path ‘obs://X/tbl_email_form_to_for_XX’)
          ETL 6 columns from 18 columns table

      example data:
      from_email_36550_phillip.allen@enron.com to_email_36550_stagecoachmama@hotmail.com from_to <29528303.1075855666657.JavaMail.evans@thyme> 1538015497000000 9755149200000

      2. the performance should be reach X millions records/s/node

      3.Provide CarbonWriter for SDK, it can write carbon data in C++ language
      ##features/interfaces
      3.1. create CarbonWriter, including create schema(withCsvInput),set outputPath, and build,
      3.2. write()
      3.3. close()
      3.4. support OBS(AK/SK/Endpoint)(withHadoopConf)
      3.5. writtenBy
      3.6. support withTableProperty, withLoadOption,taskNo, uniqueIdentifier, withThreadSafe, withBlockSize, withBlockletSize, localDictionaryThreshold, enableLocalDictionary in C++ SDK (PR2899 TO BE review)

      ##Data types:
      Carbon need support base data types, including string, float, double, int, long, date, timestamp, bool, array<String>.
      For other, we can convert:
      char array => carbon string
      Enum => Carbon string
      set and list => carbon array<String>

      ##performance
      Writing Performance is not required now

      4. read schema function
      readSchema
      getVersionDetails =>TODO

      5. support carbonproperties
      5.1 addProperty
      5.2 getProperty

      6.TODO:
      6.1.getVersionDetails. => to be review
      6.2.updated SDK/CSDK reader doc => to be review
      6.3.support byte(write read)
      6.4.support long string columns
      6.5.support sortBy=> to be review
      6.6.support withCsvInput(Schema schema); create schema(JAVA)
      6.7. optimize the write doc => to be review
      /**

      • Create a {@link CarbonWriterBuilder}

        to build a

        {@link CarbonWriter}

        */
        public static CarbonWriterBuilder builder()

        { return new CarbonWriterBuilder(); }

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                xubo245 Bo Xu
                Reporter:
                xubo245 Bo Xu
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 144h 20m
                  144h 20m