Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-2018

Optimization in reading/writing for sort temp row during data loading

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.0
    • 1.4.0
    • data-load
    • None

    Description

      1. SCENARIO

      Currently in carbondata data loading, during sort process step, records will be sorted partially and spilled to the disk. And then carbondata will read these records and do merge sort.

      Since sort step is CPU-tense, during writing/reading these records, we can optimize the serialization/deserialization for these rows and reduce CPU consumption in parsing the rows.

      This should enhance the data loading performance.

      1. RESOLVE
        We can pick up the un-sorted fields in the row and pack them as bytes array and skip paring them.
      1. RESULT

      I've tested it in my cluster and seen about 8% performance gained (74MB/s/Node -> 81MB/s/Node).

      Attachments

        Issue Links

          Activity

            People

              xuchuanyin Chuanyin Xu
              xuchuanyin Chuanyin Xu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 14h
                  14h