Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
None
-
None
-
None
-
New
Description
I have a problem when using n-gram and highlighter. I thought it had been solved in LUCENE-627...
Actually, I found this problem when I was using CJKTokenizer on Solr, though, here is lucene program to reproduce it using NGramTokenizer(min=2,max=2) instead of CJKTokenizer:
public class TestNGramHighlighter { public static void main(String[] args) throws Exception { Analyzer analyzer = new NGramAnalyzer(); final String TEXT = "Lucene can make index. Then Lucene can search."; final String QUERY = "can"; QueryParser parser = new QueryParser("f",analyzer); Query query = parser.parse(QUERY); QueryScorer scorer = new QueryScorer(query,"f"); Highlighter h = new Highlighter( scorer ); System.out.println( h.getBestFragment(analyzer, "f", TEXT) ); } static class NGramAnalyzer extends Analyzer { public TokenStream tokenStream(String field, Reader input) { return new NGramTokenizer(input,2,2); } } }
expected output is:
Lucene <B>can</B> make index. Then Lucene <B>can</B> search.
but the actual output is:
Lucene <B>can make index. Then Lucene can</B> search.
Attachments
Attachments
Issue Links
- is duplicated by
-
LUCENE-6200 Highlighter sometime went wrong
- Resolved