Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-1014

Refactor on data loading and encoding override

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Refactor on current data loading flow to make it:
      1. Use vectorized processing as early as possible
      2. Make index build (sorting) CPU cache efficient, by using rowId and key column vector to sort
      3. Open interface for format extension, including column encoding, compression, statistics.

      Design doc will be posted in this JIRA soon.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jackylk Jacky Li
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 66h 40m
                66h 40m