Lucene.Net
  1. Lucene.Net
  2. LUCENENET-156

Contrib Highlighter.net -> getBestTextFraments error

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      .NET framework

      Description

      In line 274 and 275 is comment out. So far, it's obvious written wrong. It should look like this.

      if (lastEndOffset < text.Length)
      newText.Append(encoder.EncodeText(text.Substring(lastEndOffset)));

      If this code is comment out, it could cut off the end of the field. Example i field ends with </span> the newText would end as </span

      Please correct it in the next release.

      Poul

        Activity

        Hide
        Digy added a comment -

        Hi Enrique,
        You are right. I restored Highlighter.cs

        DIGY

        Show
        Digy added a comment - Hi Enrique, You are right. I restored Highlighter.cs DIGY
        Hide
        Enrique Martínez Zúñiga added a comment -

        After applying the patch with the example code does the following marking:

        Searching for "novela":
        Una en otra: <span class=""highlight"">novela</span> de ccostumbres

        Instead of:
        Una en otra: <span class=""highlight"">novela</span> de costumbres

        And:
        <span class=""highlight"">Novela</span> CCostumbrista

        Instead of:
        <span class=""highlight"">Novela</span> Costumbrista

        Sample Code

        ...
        Dim idxSF As SimpleHTMLFormatter = New SimpleHTMLFormatter("<span class=""highlight"">", "</span>")
        Dim qs As QueryScorer = New QueryScorer(qry.Rewrite(idxReader))
        Dim highlighter As Highlighter = New Highlighter(idxSF, qs)
        Dim ts As TokenStream = idxAnalyzer.TokenStream(strFieldName, New StringReader(strValue))
        Dim strText As String = hi.GetBestFragments(ts, strValue, 80, "...")
        ts.Close()
        ...

        Show
        Enrique Martínez Zúñiga added a comment - After applying the patch with the example code does the following marking: Searching for "novela": Una en otra: <span class=""highlight"">novela</span> de ccostumbres Instead of: Una en otra: <span class=""highlight"">novela</span> de costumbres And: <span class=""highlight"">Novela</span> CCostumbrista Instead of: <span class=""highlight"">Novela</span> Costumbrista Sample Code ... Dim idxSF As SimpleHTMLFormatter = New SimpleHTMLFormatter("<span class=""highlight"">", "</span>") Dim qs As QueryScorer = New QueryScorer(qry.Rewrite(idxReader)) Dim highlighter As Highlighter = New Highlighter(idxSF, qs) Dim ts As TokenStream = idxAnalyzer.TokenStream(strFieldName, New StringReader(strValue)) Dim strText As String = hi.GetBestFragments(ts, strValue, 80, "...") ts.Close() ...
        Digy made changes -
        Field Original Value New Value
        Status Open [ 1 ] Closed [ 6 ]
        Resolution Fixed [ 1 ]
        Hide
        Digy added a comment -

        Line 269:

        • newText.Append(encoder.EncodeText(text.Substring(lastEndOffset, (startOffset) - (lastEndOffset))));
          + newText.Append(encoder.EncodeText(text.Substring(lastEndOffset, (startOffset) - (lastEndOffset)+1)));

        Fixed.

        Show
        Digy added a comment - Line 269: newText.Append(encoder.EncodeText(text.Substring(lastEndOffset, (startOffset) - (lastEndOffset)))); + newText.Append(encoder.EncodeText(text.Substring(lastEndOffset, (startOffset) - (lastEndOffset)+1))); Fixed.
        Hide
        George Aroush added a comment -

        Hi DIGY,

        I have not done much with the Highlighter code or any of the contrib code in a while. My comment to Poul was based on comparing this line to what's in the Java version. I believe the fix need to be based on my comment and not Poul's.

        – George

        Show
        George Aroush added a comment - Hi DIGY, I have not done much with the Highlighter code or any of the contrib code in a while. My comment to Poul was based on comparing this line to what's in the Java version. I believe the fix need to be based on my comment and not Poul's. – George
        Hide
        Digy added a comment -

        Hi George,
        I see changes made to Highlighter.cs at 31/Dec/2008. What is the state of it, fixed or not?.

        DIGY

        Show
        Digy added a comment - Hi George, I see changes made to Highlighter.cs at 31/Dec/2008. What is the state of it, fixed or not?. DIGY
        Hide
        George Aroush added a comment -

        Good catch, but shouldn't the fix actually be:

        if (startOffset > lastEndOffset)
        newText.Append(encoder.EncodeText(text.Substring(lastEndOffset, (startOffset) - (lastEndOffset) + 1)));

        You have the test logic changed from ">" to "<"; in my case, I added "+ 1"

        The Java code, looks like so:

        if (startOffset > lastEndOffset)
        newText.append(encoder.encodeText(text.substring(lastEndOffset, startOffset)));

        Java's substring() is: substring(int beginIndex, int endIndex)
        C#'s Substring() is: Substring(int startIndex, int length);

        Show
        George Aroush added a comment - Good catch, but shouldn't the fix actually be: if (startOffset > lastEndOffset) newText.Append(encoder.EncodeText(text.Substring(lastEndOffset, (startOffset) - (lastEndOffset) + 1))); You have the test logic changed from ">" to "<"; in my case, I added "+ 1" The Java code, looks like so: if (startOffset > lastEndOffset) newText.append(encoder.encodeText(text.substring(lastEndOffset, startOffset))); Java's substring() is: substring(int beginIndex, int endIndex) C#'s Substring() is: Substring(int startIndex, int length);
        Poul Erik Nielsen created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Poul Erik Nielsen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development