Lucene - Core
  1. Lucene - Core
  2. LUCENE-3243

FastVectorHighlighter - add position offset to FieldPhraseList.WeightedPhraseInfo.Toffs

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.2
    • Fix Version/s: None
    • Component/s: modules/highlighter
    • Labels:
    • Environment:

      Lucene 3.2

    • Lucene Fields:
      New

      Description

      Needed to return position offsets along with highlighted snippets when using FVH for highlighting.

      Using the (LUCENE-3141) patch I was able to get the fragInfo for a particular Phrase search. Currently the Toffs(Term offsets) class only stores the start and end offset.

      To get the position offset, I added the position offset information in Toffs and FieldPhraseList class.

      1. CustomSolrHighlighter.java
        7 kB
        Jahangir Anwari
      2. LUCENE-3243.patch.diff
        3 kB
        Jahangir Anwari

        Activity

        Hide
        Jahangir Anwari added a comment - - edited

        Hi Koji,

        Sorry for not elaborating more on our requirements and our implementation. Basically for every search result we needed the position(word offset) information of the search hits in the document. On the search result page, this position offsets information was embedded in the search result links. When the user clicked on a search link, at the target page using javascript and the position offset information we would highlight the search terms.

        To return the position offset information along with the highlighted snippet we created a CustomSolrHighlihter(attached). Depending on the type of query the custom highlighter returns the position offsets information.

        1. Non-phrase query: Using FieldTermStack we return the term position offset for the terms in the query.
        2. Phrase query: Using the WeightedFragInfo.fragInfos we return the term position offset for the terms in the query.

        But currently the Toffs(Term offsets) class only stores the start and end offset and so we updated it so that it would store the position information as well.

        Answers to your questions:

        • What is the position offset? Isn't it just a position?
          Yes, it is just the position.
        • Why is the position offset String?
          Since for phrase queries(e.g. "divine knowledge") the position-gap between terms == 1, WeightedPhraseInfo would only store the startOffset(i.e 12) of the first term of the phrase terms and the endOffset(i.e. 29) of the phrase terms.
           
          		
          		[startOffset, endOffset]
          "divine knowledge": [(12,29)]
          

          But as we needed position information(i.e. 5,6) of all the terms it required storing the position of the terms of a phrase query as a String.

          	[startOffset, endOffset, positions]
          "divine knowledge": [(12,29, [5,6])]
          
          
        • Why do you need setPositionOffset()?
          setPositionOffset() is used to store the positions of consecutive terms of a phrase query. For every terms of the phrase query it just appends the argument position to the current position(i.e. [5,6]).

        Example output:

        <lst name="/book/title/pg15">
           <arr name="para">
               <str>un of <strong class="highlight">divine knowledge</strong> and understanding, and become the recipients of a grace that is infinite and </str>
           </arr>
           <str name="positionOffsets">80,81,118,119</str>
        </lst>
        
        

        P.S. In order to able to override doHighlightingByFastVectorHighlighter() method in CustomSolrHighlighter we had to change the access modifier for alternateField() and getSolrFragmentsBuilder() to protected.

        Show
        Jahangir Anwari added a comment - - edited Hi Koji, Sorry for not elaborating more on our requirements and our implementation. Basically for every search result we needed the position(word offset) information of the search hits in the document. On the search result page, this position offsets information was embedded in the search result links. When the user clicked on a search link, at the target page using javascript and the position offset information we would highlight the search terms. To return the position offset information along with the highlighted snippet we created a CustomSolrHighlihter(attached). Depending on the type of query the custom highlighter returns the position offsets information. Non-phrase query: Using FieldTermStack we return the term position offset for the terms in the query. Phrase query: Using the WeightedFragInfo.fragInfos we return the term position offset for the terms in the query. But currently the Toffs(Term offsets) class only stores the start and end offset and so we updated it so that it would store the position information as well. Answers to your questions: What is the position offset? Isn't it just a position? Yes, it is just the position. Why is the position offset String? Since for phrase queries(e.g. "divine knowledge") the position-gap between terms == 1, WeightedPhraseInfo would only store the startOffset(i.e 12) of the first term of the phrase terms and the endOffset(i.e. 29) of the phrase terms. [startOffset, endOffset] "divine knowledge" : [(12,29)] But as we needed position information(i.e. 5,6) of all the terms it required storing the position of the terms of a phrase query as a String. [startOffset, endOffset, positions] "divine knowledge" : [(12,29, [5,6])] Why do you need setPositionOffset()? setPositionOffset() is used to store the positions of consecutive terms of a phrase query. For every terms of the phrase query it just appends the argument position to the current position(i.e. [5,6] ). Example output: <lst name= "/book/title/pg15" > <arr name= "para" > <str>un of <strong class= "highlight" >divine knowledge</strong> and understanding, and become the recipients of a grace that is infinite and </str> </arr> <str name= "positionOffsets" >80,81,118,119</str> </lst> P.S. In order to able to override doHighlightingByFastVectorHighlighter() method in CustomSolrHighlighter we had to change the access modifier for alternateField() and getSolrFragmentsBuilder() to protected.
        Hide
        Koji Sekiguchi added a comment -

        Thank you for the proposal and patch! I don't understand:

        • What is the position offset? Isn't it just a position?
        • Why is the position offset String?
        • Why do you need setPositionOffset()? I don't understand the implementation of the method... it appends the argument position to the current position.
        Show
        Koji Sekiguchi added a comment - Thank you for the proposal and patch! I don't understand: What is the position offset? Isn't it just a position? Why is the position offset String? Why do you need setPositionOffset()? I don't understand the implementation of the method... it appends the argument position to the current position.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jahangir Anwari
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development