Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
similar problem in bloom datamap is in issue CARBONDATA-2746; but test result is wrong if we apply same fix
Analysis:
In `DataMapChooser#extractColumnExpression`, it does not deal with `MatchExpression`. This makes no information to use the column name to filter datamap.
In `DataMapChooser#contains`, all datamap are marked as useful if lucene datamap is hit ( `ExpressionType.TEXT_MATCH`). Then the first datamap is chosen after sort step(sort by number of index column) .
In `LuceneDataMapFactoryBase#toDistributable`, carbon getAllIndexDirs and build DataMapDistributable for each index in same segment. This means that one segment will be applied `prune` by different index datamap(lucene use `indexPath` in `LuceneDataMapDistributable` to init its datamap object and build the `indexSearcherMap`)
In out test case, we build datamaps on columns:name and city, one for each.
Query uses column `name` as filter. Unfortunately, in the `DataMapChooser`, it chooses datamap of city column.
Then in `toDistributable` method, it gets all datamaps and build `LuceneDataMapDistributable`. Here in out test, it will prune and get result from each datamap.
On datamap of city, query "name:c10" in lucene return no row. On datamap of name, query "name:c10" in lucene return actual what we want.
So, if we apply same fix in CARBONDATA-2746 for lucene, we will get only one datamap ( which is for city column) and prune result will be nothing.
To Fix:
- choose correct datamap in DataMapChooser for lucene
- apply same fix in
CARBONDATA-2746to build correct `LuceneDataMapDistributable`
Attachments
Issue Links
- links to