Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8695

Word delimiter graph or span queries bug

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 7.7
    • None
    • None
    • None
    • New

    Description

      I have a simple query phrase query and a token stream which uses word delimiter graph which fails to match. I tried different configurations of word delimiter graph but could find a good solution for this. I don't actually know if the problem is on word delimiter side or maybe on span queries side.

      Query which is generated:

       spanNear([field:added, spanOr([field:foobarbaz, spanNear([field:foo, field:bar, field:baz], 0, true)]), field:entry], 0, true)
      

       

      Code of test where I isolated the problem is attached below:

      public class TestPhrase extends LuceneTestCase {
      
        private static IndexSearcher searcher;
        private static IndexReader reader;
        private Query query;
        private static Directory directory;
      
        private static Analyzer searchAnalyzer = new Analyzer() {
          @Override
          public TokenStreamComponents createComponents(String fieldName) {
            Tokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE, false);
            TokenFilter filter1 = new WordDelimiterGraphFilter(tokenizer, WordDelimiterIterator.DEFAULT_WORD_DELIM_TABLE,
                WordDelimiterGraphFilter.GENERATE_WORD_PARTS |
                    WordDelimiterGraphFilter.CATENATE_WORDS |
                    WordDelimiterGraphFilter.CATENATE_NUMBERS |
                    WordDelimiterGraphFilter.SPLIT_ON_CASE_CHANGE,
                CharArraySet.EMPTY_SET);
      
            TokenFilter filter2 = new LowerCaseFilter(filter1);
      
            return new TokenStreamComponents(tokenizer, filter2);
          }
        };
      
        private static Analyzer indexAnalyzer = new Analyzer() {
          @Override
          public TokenStreamComponents createComponents(String fieldName) {
            Tokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE, false);
            TokenFilter filter1 = new WordDelimiterGraphFilter(tokenizer, WordDelimiterIterator.DEFAULT_WORD_DELIM_TABLE,
                WordDelimiterGraphFilter.GENERATE_WORD_PARTS |
                WordDelimiterGraphFilter.GENERATE_NUMBER_PARTS |
                WordDelimiterGraphFilter.CATENATE_WORDS |
                WordDelimiterGraphFilter.CATENATE_NUMBERS |
                WordDelimiterGraphFilter.PRESERVE_ORIGINAL |
                WordDelimiterGraphFilter.SPLIT_ON_CASE_CHANGE,
                CharArraySet.EMPTY_SET);
      
            TokenFilter filter2 = new LowerCaseFilter(filter1);
      
            return new TokenStreamComponents(tokenizer, filter2);
          }
      
          @Override
          public int getPositionIncrementGap(String fieldName) {
            return 100;
          }
        };
      
        @BeforeClass
        public static void beforeClass() throws Exception {
          directory = newDirectory();
          RandomIndexWriter writer = new RandomIndexWriter(random(), directory, indexAnalyzer);
      
          Document doc = new Document();
          doc.add(newTextField("field", "Added FooBarBaz entry", Field.Store.YES));
          writer.addDocument(doc);
      
          reader = writer.getReader();
          writer.close();
      
          searcher = new IndexSearcher(reader);
        }
      
        @Override
        public void setUp() throws Exception {
          super.setUp();
        }
      
        @AfterClass
        public static void afterClass() throws Exception {
          searcher = null;
          reader.close();
          reader = null;
          directory.close();
          directory = null;
        }
      
        public void testSearch() throws Exception {
          QueryParser parser = new QueryParser("field", searchAnalyzer);
          query = parser.parse("\"Added FooBarBaz entry\"");
          System.out.println(query);
          ScoreDoc[] hits = searcher.search(query, 1000).scoreDocs;
          assertEquals(1, hits.length);
        }
      
      }
      

       

       

      NOTE: I tested it on Lucene 7.1.0, 7.4.0 and 7.7.0

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              prog Pawel Rog
              Votes:
              3 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: