Uploaded image for project: 'Apache Jena'
  1. Apache Jena
  2. JENA-1459

add highlighting support to jena-text

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: Jena 3.6.0
    • Fix Version/s: Jena 3.7.0
    • Component/s: Jena, Text
    • Labels:
      None

      Description

      This issue proposes an improvement to jena-text to include optional highlighting of results via:

      org.apache.lucene.search.highlight.Highlighter

      and

      org.apache.lucene.search.highlight.SimpleHTMLFormatter

      The improvement will add an optional input argument to TextQueryPF that signals that highlighting should be performed on the Lucene search results; optionally indicates the start and end char sequences of a highlighted term; optionally indicates the maximum number of fragments to highlight; and optionally indicates a fragment separator.

      The highlighted results are bound to the ?literal output argument of TextQueryPF.

      Inclusion of this improvement will introduce a simple extraction of the highlight option string and a single test for the presence of the option string so that the inclusion of the improvement will be of minimal impact when highlighting is not used. The highlight option string is passed directly to TextIndex.query(...) and so can be used from code other than TextQueryPF.

      The simplest use of highlighting is like:

      select ?s ?lit
      where {
        (?s ?sc ?lit) text:query (skos:prefLabel "one" 100 "lang:en" "highlight:") .
      }
      

      which will produce results such as:

      "another ↦one↤ abc"@en
      

      the right-arrow (\u21a6) and left-arrow (\u21a4) are the default start and end highlighting character sequences. These are chosen to be very unlikely to occur in literals. These can be changed easily via "s:" and "e:" in the highlight options, for example:

      select ?s ?lit
      where {
        (?s ?sc ?lit) text:query (skos:prefLabel "one" 100 "lang:en" "highlight: s:<em class='hilite'> | e:</em>") .
      

      which will produce results such as:

      "another <em class='hilite'>one</em> abc"@en
      

      Coding of this improvement is complete and a PR can be issued if there is agreement that this improvement should be included in jena-text.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                code-ferret Code Ferret
                Reporter:
                code-ferret Code Ferret
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 24h
                  24h
                  Remaining:
                  Remaining Estimate - 24h
                  24h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified