Details
-
Improvement
-
Status: Reopened
-
Major
-
Resolution: Unresolved
-
1.1.0
-
None
-
None
Description
In current implements of CarbonInputFormat.getDataBlocksOfSegment,
1. Get all of the carbondata splits in segments directory.
2. Read the carbonindex and construct the B-tree.
3. Apply filter and get matching splits.
I think we get some useless splits and the operator of getSplits is expensive. So we'd better to do the getSplits after filter:
1. List the segment directory, and filter the path of carbonindex.
2. Read the carbonindex and construct the B-tree.
3. Apply filter and get matching blocks.
4. Get carbondata splits from filtered blocks.