Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-2747

Fix Lucene datamap choosing and DataMapDistributable building

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.1
    • None
    • None

    Description

      similar problem in bloom datamap is in issue CARBONDATA-2746; but test result is wrong  if we apply same fix

       

      Analysis:

      In `DataMapChooser#extractColumnExpression`, it does not deal with `MatchExpression`. This makes no information to use the column name to filter datamap.

       

      In `DataMapChooser#contains`, all datamap are marked as useful if lucene datamap is hit ( `ExpressionType.TEXT_MATCH`). Then the first datamap is chosen after sort step(sort by number of index column) . 

       

      In `LuceneDataMapFactoryBase#toDistributable`, carbon getAllIndexDirs and build DataMapDistributable for each index in same segment. This means that one segment will be applied `prune` by different index datamap(lucene use `indexPath` in `LuceneDataMapDistributable` to init its datamap object and build the `indexSearcherMap`)

       

      In out test case, we build datamaps  on columns:name and city, one for each.

      Query uses column `name` as filter. Unfortunately, in the `DataMapChooser`, it chooses datamap of city column.

      Then in `toDistributable` method, it gets all datamaps and build `LuceneDataMapDistributable`. Here in out test, it will prune and get result from each datamap.

      On datamap of city, query "name:c10"  in lucene return no row. On datamap of name, query "name:c10"  in lucene return actual what we want.

       

      So, if we apply same fix in CARBONDATA-2746 for lucene,  we will get only one datamap ( which is for city column) and prune result will be nothing.

       

      To Fix:

      1. choose correct datamap in DataMapChooser for lucene
      2. apply same fix in CARBONDATA-2746 to build correct `LuceneDataMapDistributable`

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              manhua Manhua Jiang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m