Lucene - Core
  1. Lucene - Core
  2. LUCENE-4899

FastVectorHighlihgter fails with SIOOB if single phrase or term is > fragCharSize

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0, 4.1, 4.2, 3.6.2, 4.2.1
    • Fix Version/s: 4.3, 6.0
    • Component/s: modules/highlighter
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      This has been reported on several occasions like SOLR-4660 / SOLR-4137 or on the ES mailing list https://groups.google.com/d/msg/elasticsearch/IdyMSPK5gao/nKZq8_NYWmgJ

      The reason is that the current code expects the fragCharSize > matchLength which is not necessarily true if you use phrases or if you have very long terms like URLs or so. I have a test that reproduces the issue and a fix as far as I can tell (me doesn't have much experience with the highlighter.

      1. LUCENE-4899.patch
        16 kB
        Simon Willnauer
      2. LUCENE-4899.patch
        7 kB
        Simon Willnauer

        Issue Links

          Activity

          Hide
          Simon Willnauer added a comment -

          here is a patch but somebody with more FVH skills should look at this. At least there is a test that fails

          Show
          Simon Willnauer added a comment - here is a patch but somebody with more FVH skills should look at this. At least there is a test that fails
          Hide
          Simon Willnauer added a comment -

          as far as I can tell those are caused by this

          Show
          Simon Willnauer added a comment - as far as I can tell those are caused by this
          Hide
          Simon Willnauer added a comment -

          anybody with more FVH knowledge up for a review?

          Show
          Simon Willnauer added a comment - anybody with more FVH knowledge up for a review?
          Hide
          Koji Sekiguchi added a comment -

          Looks good, Simon!

          Show
          Koji Sekiguchi added a comment - Looks good, Simon!
          Hide
          Simon Willnauer added a comment -

          thanks koji for looking at it. Yet, I think we shouldn't fix this in the way I proposed it in the previous patch. I rather think if we have a single phrase that is greater than the fragCharSize we should just not highlight that passage at all. This is more conservative here and I think the correct thing to do otherwise we can easily end up with phrases way bigger than the fragment char size. I tried to simplify this BaseFragListBuilder a bit and make this entire behaviour pluggable so folks can decide if they want to risk the size of the fragments to explode?

          koji can you take a look at this again?

          Show
          Simon Willnauer added a comment - thanks koji for looking at it. Yet, I think we shouldn't fix this in the way I proposed it in the previous patch. I rather think if we have a single phrase that is greater than the fragCharSize we should just not highlight that passage at all. This is more conservative here and I think the correct thing to do otherwise we can easily end up with phrases way bigger than the fragment char size. I tried to simplify this BaseFragListBuilder a bit and make this entire behaviour pluggable so folks can decide if they want to risk the size of the fragments to explode? koji can you take a look at this again?
          Hide
          Koji Sekiguchi added a comment -

          Looks good! Sounds reasonable and I like the idea.

          Show
          Koji Sekiguchi added a comment - Looks good! Sounds reasonable and I like the idea.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] simonw
          http://svn.apache.org/viewvc?view=revision&revision=1465032

          LUCENE-4899: FastVectorHighlihgter failed with StringIndexOutOfBoundsException if a single highlight phrase or term was greater than the fragCharSize producing negative string offsets

          Show
          Commit Tag Bot added a comment - [trunk commit] simonw http://svn.apache.org/viewvc?view=revision&revision=1465032 LUCENE-4899 : FastVectorHighlihgter failed with StringIndexOutOfBoundsException if a single highlight phrase or term was greater than the fragCharSize producing negative string offsets
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] simonw
          http://svn.apache.org/viewvc?view=revision&revision=1465041

          LUCENE-4899: FastVectorHighlihgter failed with StringIndexOutOfBoundsException if a single highlight phrase or term was greater than the fragCharSize producing negative string offsets

          Show
          Commit Tag Bot added a comment - [branch_4x commit] simonw http://svn.apache.org/viewvc?view=revision&revision=1465041 LUCENE-4899 : FastVectorHighlihgter failed with StringIndexOutOfBoundsException if a single highlight phrase or term was greater than the fragCharSize producing negative string offsets
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Simon Willnauer
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development