Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      PhraseQuery behaves quite inconsistently when the position of the first term is greater than 0. Here is an example:

          Directory dir = newDirectory();
          RandomIndexWriter iw = new RandomIndexWriter(random(), dir);
          FieldType customType = new FieldType(TextField.TYPE_NOT_STORED);
          customType.setOmitNorms(true);
          Field f = new Field("body", "", customType);
          Document doc = new Document();
          doc.add(f);
          f.setStringValue("one quick fox");
          iw.addDocument(doc);
          IndexReader ir = iw.getReader();
          iw.close();
          IndexSearcher is = newSearcher(ir);
          
          PhraseQuery pq = new PhraseQuery();
          pq.add(new Term("body", "quick"), 0);
          pq.add(new Term("body", "fox"), 1);
          System.out.println(is.search(pq, 1).totalHits); // 1
      
          pq = new PhraseQuery();
          pq.add(new Term("body", "quick"), 10);
          pq.add(new Term("body", "fox"), 11);
          System.out.println(is.search(pq, 1).totalHits); // 0
          
          pq = new PhraseQuery();
          pq.add(new Term("body", "quick"), 10);
          System.out.println(is.search(pq, 1).totalHits); // 1
          
          pq = new PhraseQuery();
          pq.add(new Term("body", "quick"), 10);
          pq.add(new Term("body", "fox"), 11);
          pq.setSlop(1);
          System.out.println(is.search(pq, 1).totalHits); // 1
          
          ir.close();
          dir.close();
      

      The reason is that when you add a term with position P on a PhraseQuery, ExactPhraseScorer ignores all positions for this term which are less than P.

      But this is inconsistent:

      • if you have a single term, it does not work anymore since we rewrite to a term query regardless of the position of the term (3rd query)
      • if you increase the slop, we will use SloppyPhraseScorer which does not have this behaviour. (4th query)

      So I think we have two options:

      • either remove this behaviour and make the positions that are provided to PhraseQuery only relative (ie. fix ExactPhraseScorer)
      • or make it work this way across the board (which means not rewriting to a term query when the position is not 0 and fixing SloppyPhraseScorer).

        Attachments

          Activity

            People

            • Assignee:
              jpountz Adrien Grand
              Reporter:
              jpountz Adrien Grand
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: