Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7541

FVH does not work well with phrases and multiple tags

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • trunk
    • None
    • modules/highlighter
    • None
    • New

    Description

      I'm indexing a document with a field which is aaa bbb ccc ddd bbb eee fff.

      I'm running a Bool Query which contains 2 should Phrase queries: aaa bbb and eee fff.

      I'm using an FVH with two tags <1></1> and <2></2>.

      It gives the correct result: <1>aaa bbb</1> ccc ddd bbb <2>eee fff</2>

      With same settings, I'm now running with 2 should Phrase queries: aaa bbb and bbb eee.

      I'm getting back a wrong result: <1>aaa bbb</1> ccc ddd <1>bbb eee</1> fff where I'm expecting <1>aaa bbb</1> ccc ddd <2>bbb eee</2> fff.

      Why this?

      Apparently the FVH is getting back as sequence numbers in the first case 0 and 1 but in the second case 0 and 2.

      The problem is when we call then getPreTag, we are getting the first tag instead of the second one:

        protected String getPreTag( String[] preTags, int num ){
          int n = num % preTags.length;
          return preTags[n];
        }
        
        protected String getPostTag( String[] postTags, int num ){
          int n = num % postTags.length;
          return postTags[n];
        }
      

      I did not find yet how to fix that. But I believe it is somewhere in org.apache.lucene.search.vectorhighlight.FieldQuery class

          private void markTerminal( int slop, float boost ){
            this.terminal = true;
            this.slop = slop;
            this.boost = boost;
            this.termOrPhraseNumber = fieldQuery.nextTermOrPhraseNumber();
          }
      

      This call to nextTermOrPhraseNumber() increments the term number I guess because we have already seen the term BBB previously.

      I'm going to join a test case patch.

      Attachments

        Activity

          People

            Unassigned Unassigned
            dadoonet David Pilato
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: