[CARBONDATA-2747] Fix Lucene datamap choosing and DataMapDistributable building - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.1
Component/s: None
Labels:
None

Description

similar problem in bloom datamap is in issue ~~CARBONDATA-2746~~; but test result is wrong if we apply same fix

Analysis:

In `DataMapChooser#extractColumnExpression`, it does not deal with `MatchExpression`. This makes no information to use the column name to filter datamap.

In `DataMapChooser#contains`, all datamap are marked as useful if lucene datamap is hit ( `ExpressionType.TEXT_MATCH`). Then the first datamap is chosen after sort step(sort by number of index column) .

In `LuceneDataMapFactoryBase#toDistributable`, carbon getAllIndexDirs and build DataMapDistributable for each index in same segment. This means that one segment will be applied `prune` by different index datamap(lucene use `indexPath` in `LuceneDataMapDistributable` to init its datamap object and build the `indexSearcherMap`)

In out test case, we build datamaps on columns:name and city, one for each.

Query uses column `name` as filter. Unfortunately, in the `DataMapChooser`, it chooses datamap of city column.

Then in `toDistributable` method, it gets all datamaps and build `LuceneDataMapDistributable`. Here in out test, it will prune and get result from each datamap.

On datamap of city, query "name:c10" in lucene return no row. On datamap of name, query "name:c10" in lucene return actual what we want.

So, if we apply same fix in ~~CARBONDATA-2746~~ for lucene, we will get only one datamap ( which is for city column) and prune result will be nothing.

To Fix:

choose correct datamap in DataMapChooser for lucene
apply same fix in ~~CARBONDATA-2746~~ to build correct `LuceneDataMapDistributable`

Attachments

Issue Links

links to

GitHub Pull Request #2519

Activity

People

Assignee:: Unassigned

Reporter:: Manhua Jiang

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/Jul/18 01:04

Updated:: 24/Jul/18 06:42

Resolved:: 24/Jul/18 06:42

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 20m