Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-4368

Excerpt extraction from the Lucene index should be more selective

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.30, 1.2.14, 1.4.2, 1.5.2
    • 1.2.17, 1.4.4, 1.5.4, 1.6.0
    • lucene
    • None

    Description

      Lucene index can be used in order to extract rep:excerpt using Highlighter.
      The current implementation may suffer performance issues when the result set of the original query contains a lot of results, each of them possibly containing lots of (stored) properties that get passed to the highlighter in order to try to extract the excerpt; such a process doesn't stop as soon as the first excerpt is found so that excerpt is composed using text from all stored properties in all results (if there's a match on the query).

      While we can accept some cost of extracting excerpt at query time (whereas it was generated at excerpt retrieval time before OAK-3580, e.g. via row.getValue("rep:excerpt")) , that should be bounded and mitigated as much as possible.

      Attachments

        1. OAK-4368.0.patch
          18 kB
          Tommaso Teofili

        Issue Links

          Activity

            People

              teofili Tommaso Teofili
              teofili Tommaso Teofili
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: