Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
7.7
-
None
-
None
-
None
-
New
Description
I have a simple query phrase query and a token stream which uses word delimiter graph which fails to match. I tried different configurations of word delimiter graph but could find a good solution for this. I don't actually know if the problem is on word delimiter side or maybe on span queries side.
Query which is generated:
spanNear([field:added, spanOr([field:foobarbaz, spanNear([field:foo, field:bar, field:baz], 0, true)]), field:entry], 0, true)
Code of test where I isolated the problem is attached below:
public class TestPhrase extends LuceneTestCase { private static IndexSearcher searcher; private static IndexReader reader; private Query query; private static Directory directory; private static Analyzer searchAnalyzer = new Analyzer() { @Override public TokenStreamComponents createComponents(String fieldName) { Tokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE, false); TokenFilter filter1 = new WordDelimiterGraphFilter(tokenizer, WordDelimiterIterator.DEFAULT_WORD_DELIM_TABLE, WordDelimiterGraphFilter.GENERATE_WORD_PARTS | WordDelimiterGraphFilter.CATENATE_WORDS | WordDelimiterGraphFilter.CATENATE_NUMBERS | WordDelimiterGraphFilter.SPLIT_ON_CASE_CHANGE, CharArraySet.EMPTY_SET); TokenFilter filter2 = new LowerCaseFilter(filter1); return new TokenStreamComponents(tokenizer, filter2); } }; private static Analyzer indexAnalyzer = new Analyzer() { @Override public TokenStreamComponents createComponents(String fieldName) { Tokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE, false); TokenFilter filter1 = new WordDelimiterGraphFilter(tokenizer, WordDelimiterIterator.DEFAULT_WORD_DELIM_TABLE, WordDelimiterGraphFilter.GENERATE_WORD_PARTS | WordDelimiterGraphFilter.GENERATE_NUMBER_PARTS | WordDelimiterGraphFilter.CATENATE_WORDS | WordDelimiterGraphFilter.CATENATE_NUMBERS | WordDelimiterGraphFilter.PRESERVE_ORIGINAL | WordDelimiterGraphFilter.SPLIT_ON_CASE_CHANGE, CharArraySet.EMPTY_SET); TokenFilter filter2 = new LowerCaseFilter(filter1); return new TokenStreamComponents(tokenizer, filter2); } @Override public int getPositionIncrementGap(String fieldName) { return 100; } }; @BeforeClass public static void beforeClass() throws Exception { directory = newDirectory(); RandomIndexWriter writer = new RandomIndexWriter(random(), directory, indexAnalyzer); Document doc = new Document(); doc.add(newTextField("field", "Added FooBarBaz entry", Field.Store.YES)); writer.addDocument(doc); reader = writer.getReader(); writer.close(); searcher = new IndexSearcher(reader); } @Override public void setUp() throws Exception { super.setUp(); } @AfterClass public static void afterClass() throws Exception { searcher = null; reader.close(); reader = null; directory.close(); directory = null; } public void testSearch() throws Exception { QueryParser parser = new QueryParser("field", searchAnalyzer); query = parser.parse("\"Added FooBarBaz entry\""); System.out.println(query); ScoreDoc[] hits = searcher.search(query, 1000).scoreDocs; assertEquals(1, hits.length); } }
NOTE: I tested it on Lucene 7.1.0, 7.4.0 and 7.7.0
Attachments
Issue Links
- is part of
-
LUCENE-7398 Nested Span Queries are buggy
- Open