Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6255

PhraseQuery inconsistencies

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 5.1
    • None
    • None
    • New

    Description

      PhraseQuery behaves quite inconsistently when the position of the first term is greater than 0. Here is an example:

          Directory dir = newDirectory();
          RandomIndexWriter iw = new RandomIndexWriter(random(), dir);
          FieldType customType = new FieldType(TextField.TYPE_NOT_STORED);
          customType.setOmitNorms(true);
          Field f = new Field("body", "", customType);
          Document doc = new Document();
          doc.add(f);
          f.setStringValue("one quick fox");
          iw.addDocument(doc);
          IndexReader ir = iw.getReader();
          iw.close();
          IndexSearcher is = newSearcher(ir);
          
          PhraseQuery pq = new PhraseQuery();
          pq.add(new Term("body", "quick"), 0);
          pq.add(new Term("body", "fox"), 1);
          System.out.println(is.search(pq, 1).totalHits); // 1
      
          pq = new PhraseQuery();
          pq.add(new Term("body", "quick"), 10);
          pq.add(new Term("body", "fox"), 11);
          System.out.println(is.search(pq, 1).totalHits); // 0
          
          pq = new PhraseQuery();
          pq.add(new Term("body", "quick"), 10);
          System.out.println(is.search(pq, 1).totalHits); // 1
          
          pq = new PhraseQuery();
          pq.add(new Term("body", "quick"), 10);
          pq.add(new Term("body", "fox"), 11);
          pq.setSlop(1);
          System.out.println(is.search(pq, 1).totalHits); // 1
          
          ir.close();
          dir.close();
      

      The reason is that when you add a term with position P on a PhraseQuery, ExactPhraseScorer ignores all positions for this term which are less than P.

      But this is inconsistent:

      • if you have a single term, it does not work anymore since we rewrite to a term query regardless of the position of the term (3rd query)
      • if you increase the slop, we will use SloppyPhraseScorer which does not have this behaviour. (4th query)

      So I think we have two options:

      • either remove this behaviour and make the positions that are provided to PhraseQuery only relative (ie. fix ExactPhraseScorer)
      • or make it work this way across the board (which means not rewriting to a term query when the position is not 0 and fixing SloppyPhraseScorer).

      Attachments

        1. LUCENE-6255.patch
          8 kB
          Adrien Grand

        Activity

          People

            jpountz Adrien Grand
            jpountz Adrien Grand
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: