Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-844

Avoid to get useless splits

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 1.1.0
    • None
    • core
    • None

    Description

      In current implements of CarbonInputFormat.getDataBlocksOfSegment,
      1. Get all of the carbondata splits in segments directory.
      2. Read the carbonindex and construct the B-tree.
      3. Apply filter and get matching splits.

      I think we get some useless splits and the operator of getSplits is expensive. So we'd better to do the getSplits after filter:
      1. List the segment directory, and filter the path of carbonindex.
      2. Read the carbonindex and construct the B-tree.
      3. Apply filter and get matching blocks.
      4. Get carbondata splits from filtered blocks.

      Attachments

        Activity

          People

            waterman Yadong Qi
            waterman Yadong Qi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h