Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.4
    • Component/s: highlighter
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None
    • Flags:
      Patch

      Description

      This ticket is for creating a Solr plugin that can utilize the new UnifiedHighlighter which was initially committed in https://issues.apache.org/jira/browse/LUCENE-7438

      1. SOLR-9708.patch
        56 kB
        David Smiley

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user Timothy055 opened a pull request:

          https://github.com/apache/lucene-solr/pull/107

          SOLR-9708 UnifiedHighlighter Solr Plugin

          An initial implementation of a solr plugin for the unified highlighter.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/Timothy055/lucene-solr features/SOLR-9708

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/107.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #107


          commit 669f94f27c73b9fd5190a980a089cff9d4df3015
          Author: Timothy Rodriguez <trodriguez25@bloomberg.net>
          Date: 2016-10-31T21:05:47Z

          initial commit of UnifiedSolrHighlighter adapter

          commit 2cf4b2d35f1f535e8b8126b5dd685dc3bb663391
          Author: Timothy Rodriguez <trodriguez25@bloomberg.net>
          Date: 2016-10-31T21:12:19Z

          add license header


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user Timothy055 opened a pull request: https://github.com/apache/lucene-solr/pull/107 SOLR-9708 UnifiedHighlighter Solr Plugin An initial implementation of a solr plugin for the unified highlighter. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Timothy055/lucene-solr features/ SOLR-9708 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/107.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #107 commit 669f94f27c73b9fd5190a980a089cff9d4df3015 Author: Timothy Rodriguez <trodriguez25@bloomberg.net> Date: 2016-10-31T21:05:47Z initial commit of UnifiedSolrHighlighter adapter commit 2cf4b2d35f1f535e8b8126b5dd685dc3bb663391 Author: Timothy Rodriguez <trodriguez25@bloomberg.net> Date: 2016-10-31T21:12:19Z add license header
          Hide
          dsmiley David Smiley added a comment -

          To anyone following along: The code is the PostingsHighlighter's Solr adapter, with just a few changes to work with the UH instead.

          Show
          dsmiley David Smiley added a comment - To anyone following along: The code is the PostingsHighlighter's Solr adapter, with just a few changes to work with the UH instead.
          Hide
          Timothy055 Timothy M. Rodriguez added a comment -

          I've pushed tests for the configurable items in the UH as well as for support of multiple snippets. In addition a change was done to push highlighter specific logic down into the DefaultSolrHighlighter that was in the HighlightComponent (thanks David Smiley for pointing that out).

          Show
          Timothy055 Timothy M. Rodriguez added a comment - I've pushed tests for the configurable items in the UH as well as for support of multiple snippets. In addition a change was done to push highlighter specific logic down into the DefaultSolrHighlighter that was in the HighlightComponent (thanks David Smiley for pointing that out).
          Hide
          Timothy055 Timothy M. Rodriguez added a comment -

          Let me know what you think. If it looks good, I think we can commit it.

          Show
          Timothy055 Timothy M. Rodriguez added a comment - Let me know what you think. If it looks good, I think we can commit it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/107#discussion_r87740749

          — Diff: solr/core/src/test-files/solr/collection1/conf/solrconfig-unifiedhighlight.xml —
          @@ -0,0 +1,35 @@
          +<?xml version="1.0" ?>
          +
          +<!--
          + Licensed to the Apache Software Foundation (ASF) under one or more
          + contributor license agreements. See the NOTICE file distributed with
          + this work for additional information regarding copyright ownership.
          + The ASF licenses this file to You under the Apache License, Version 2.0
          + (the "License"); you may not use this file except in compliance with
          + the License. You may obtain a copy of the License at
          +
          + http://www.apache.org/licenses/LICENSE-2.0
          +
          + Unless required by applicable law or agreed to in writing, software
          + distributed under the License is distributed on an "AS IS" BASIS,
          + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + See the License for the specific language governing permissions and
          + limitations under the License.
          +-->
          +
          +<!-- a basic solrconfig for postings highlighter -->
          — End diff –

          obsolete postings highlighter reference

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/107#discussion_r87740749 — Diff: solr/core/src/test-files/solr/collection1/conf/solrconfig-unifiedhighlight.xml — @@ -0,0 +1,35 @@ +<?xml version="1.0" ?> + +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<!-- a basic solrconfig for postings highlighter --> — End diff – obsolete postings highlighter reference
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/107#discussion_r87743142

          — Diff: solr/core/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java —
          @@ -373,6 +373,11 @@ protected BoundaryScanner getBoundaryScanner(String fieldName, SolrParams params
          if (!isHighlightingEnabled(params)) // also returns early if no unique key field
          return null;

          + boolean rewrite = query != null && !(Boolean.valueOf(params.get(HighlightParams.USE_PHRASE_HIGHLIGHTER, "true")) &&
          — End diff –

          no biggie but I think a simple if(...) condition would be simpler; no variable.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/107#discussion_r87743142 — Diff: solr/core/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java — @@ -373,6 +373,11 @@ protected BoundaryScanner getBoundaryScanner(String fieldName, SolrParams params if (!isHighlightingEnabled(params)) // also returns early if no unique key field return null; + boolean rewrite = query != null && !(Boolean.valueOf(params.get(HighlightParams.USE_PHRASE_HIGHLIGHTER, "true")) && — End diff – no biggie but I think a simple if(...) condition would be simpler; no variable.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/107#discussion_r87741032

          — Diff: solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java —
          @@ -0,0 +1,366 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.solr.highlight;
          +
          +import java.io.IOException;
          +import java.text.BreakIterator;
          +import java.util.Collections;
          +import java.util.List;
          +import java.util.Locale;
          +import java.util.Map;
          +import java.util.Set;
          +
          +import org.apache.lucene.document.Document;
          +import org.apache.lucene.search.DocIdSetIterator;
          +import org.apache.lucene.search.Query;
          +import org.apache.lucene.search.postingshighlight.WholeBreakIterator;
          +import org.apache.lucene.search.uhighlight.DefaultPassageFormatter;
          +import org.apache.lucene.search.uhighlight.PassageFormatter;
          +import org.apache.lucene.search.uhighlight.PassageScorer;
          +import org.apache.lucene.search.uhighlight.UnifiedHighlighter;
          +import org.apache.solr.common.params.HighlightParams;
          +import org.apache.solr.common.params.SolrParams;
          +import org.apache.solr.common.util.NamedList;
          +import org.apache.solr.common.util.SimpleOrderedMap;
          +import org.apache.solr.core.PluginInfo;
          +import org.apache.solr.request.SolrQueryRequest;
          +import org.apache.solr.request.SolrRequestInfo;
          +import org.apache.solr.schema.IndexSchema;
          +import org.apache.solr.schema.SchemaField;
          +import org.apache.solr.search.DocIterator;
          +import org.apache.solr.search.DocList;
          +import org.apache.solr.search.SolrIndexSearcher;
          +import org.apache.solr.util.RTimerTree;
          +import org.apache.solr.util.plugin.PluginInfoInitialized;
          +
          +/*
          + * TODO: The HighlightComponent should not call rewrite on the query; it should be up to the
          + * SolrHighlighter to do if needed. Furthermore this arrangement is odd – why are these abstractions separate?
          + */
          +
          +/**
          + * Highlighter impl that uses

          {@link UnifiedHighlighter}

          + * <p>
          + * Example configuration with default values:
          + * <pre class="prettyprint">
          + * <requestHandler name="standard" class="solr.StandardRequestHandler">
          + * <lst name="defaults">
          + * <int name="hl.snippets">1</int>
          + * <str name="hl.tag.pre">&lt;em&gt;</str>
          + * <str name="hl.tag.post">&lt;/em&gt;</str>
          + * <str name="hl.tag.ellipsis">... </str>
          + * <bool name="hl.defaultSummary">true</bool>
          + * <str name="hl.encoder">simple</str>
          + * <float name="hl.score.k1">1.2</float>
          + * <float name="hl.score.b">0.75</float>
          + * <float name="hl.score.pivot">87</float>
          + * <str name="hl.bs.language"></str>
          + * <str name="hl.bs.country"></str>
          + * <str name="hl.bs.variant"></str>
          + * <str name="hl.bs.type">SENTENCE</str>
          + * <int name="hl.maxAnalyzedChars">10000</int>
          + * <bool name="hl.highlightMultiTerm">true</bool>
          + * </lst>
          + * </requestHandler>
          + * </pre>
          + * ...
          + * <pre class="prettyprint">
          + * <searchComponent class="solr.HighlightComponent" name="highlight">
          + * <highlighting class="org.apache.solr.highlight.UnifiedSolrHighlighter"/>
          + * </searchComponent>
          + * </pre>
          + * <p>
          + * Notes:
          + * <ul>
          + * <li>hl.q (string) can specify the query
          + * <li>hl.fl (string) specifies the field list.
          + * <li>hl.snippets (int) specifies how many snippets to return.
          + * <li>hl.tag.pre (string) specifies text which appears before a highlighted term.
          + * <li>hl.tag.post (string) specifies text which appears after a highlighted term.
          + * <li>hl.tag.ellipsis (string) specifies text which joins non-adjacent passages. The default is to retain each
          + * value in a list without joining them.
          + * <li>hl.defaultSummary (bool) specifies if a field should have a default summary of the leading text.
          + * <li>hl.encoder (string) can be 'html' (html escapes content) or 'simple' (no escaping).
          + * <li>hl.score.k1 (float) specifies bm25 scoring parameter 'k1'
          + * <li>hl.score.b (float) specifies bm25 scoring parameter 'b'
          + * <li>hl.score.pivot (float) specifies bm25 scoring parameter 'avgdl'
          + * <li>hl.bs.type (string) specifies how to divide text into passages: [SENTENCE, LINE, WORD, CHAR, WHOLE]
          + * <li>hl.bs.language (string) specifies language code for BreakIterator. default is empty string (root locale)
          + * <li>hl.bs.country (string) specifies country code for BreakIterator. default is empty string (root locale)
          + * <li>hl.bs.variant (string) specifies country code for BreakIterator. default is empty string (root locale)
          + * <li>hl.maxAnalyzedChars specifies how many characters at most will be processed in a document for any one field.
          + * <li>hl.highlightMultiTerm enables highlighting for range/wildcard/fuzzy/prefix queries at some cost.
          + * <li>hl.usePhraseHighlighter (bool) enables highlighting phrases and some other queries strictly at some cost.</li>
          — End diff –

          We know this is actually faster. So I think we can remove the reference here. But I think we forgot it in the list above to indicate it's true by default.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/107#discussion_r87741032 — Diff: solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java — @@ -0,0 +1,366 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.highlight; + +import java.io.IOException; +import java.text.BreakIterator; +import java.util.Collections; +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.Set; + +import org.apache.lucene.document.Document; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.postingshighlight.WholeBreakIterator; +import org.apache.lucene.search.uhighlight.DefaultPassageFormatter; +import org.apache.lucene.search.uhighlight.PassageFormatter; +import org.apache.lucene.search.uhighlight.PassageScorer; +import org.apache.lucene.search.uhighlight.UnifiedHighlighter; +import org.apache.solr.common.params.HighlightParams; +import org.apache.solr.common.params.SolrParams; +import org.apache.solr.common.util.NamedList; +import org.apache.solr.common.util.SimpleOrderedMap; +import org.apache.solr.core.PluginInfo; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.request.SolrRequestInfo; +import org.apache.solr.schema.IndexSchema; +import org.apache.solr.schema.SchemaField; +import org.apache.solr.search.DocIterator; +import org.apache.solr.search.DocList; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RTimerTree; +import org.apache.solr.util.plugin.PluginInfoInitialized; + +/* + * TODO: The HighlightComponent should not call rewrite on the query; it should be up to the + * SolrHighlighter to do if needed. Furthermore this arrangement is odd – why are these abstractions separate? + */ + +/** + * Highlighter impl that uses {@link UnifiedHighlighter} + * <p> + * Example configuration with default values: + * <pre class="prettyprint"> + * <requestHandler name="standard" class="solr.StandardRequestHandler"> + * <lst name="defaults"> + * <int name="hl.snippets">1</int> + * <str name="hl.tag.pre">&lt;em&gt;</str> + * <str name="hl.tag.post">&lt;/em&gt;</str> + * <str name="hl.tag.ellipsis">... </str> + * <bool name="hl.defaultSummary">true</bool> + * <str name="hl.encoder">simple</str> + * <float name="hl.score.k1">1.2</float> + * <float name="hl.score.b">0.75</float> + * <float name="hl.score.pivot">87</float> + * <str name="hl.bs.language"></str> + * <str name="hl.bs.country"></str> + * <str name="hl.bs.variant"></str> + * <str name="hl.bs.type">SENTENCE</str> + * <int name="hl.maxAnalyzedChars">10000</int> + * <bool name="hl.highlightMultiTerm">true</bool> + * </lst> + * </requestHandler> + * </pre> + * ... + * <pre class="prettyprint"> + * <searchComponent class="solr.HighlightComponent" name="highlight"> + * <highlighting class="org.apache.solr.highlight.UnifiedSolrHighlighter"/> + * </searchComponent> + * </pre> + * <p> + * Notes: + * <ul> + * <li>hl.q (string) can specify the query + * <li>hl.fl (string) specifies the field list. + * <li>hl.snippets (int) specifies how many snippets to return. + * <li>hl.tag.pre (string) specifies text which appears before a highlighted term. + * <li>hl.tag.post (string) specifies text which appears after a highlighted term. + * <li>hl.tag.ellipsis (string) specifies text which joins non-adjacent passages. The default is to retain each + * value in a list without joining them. + * <li>hl.defaultSummary (bool) specifies if a field should have a default summary of the leading text. + * <li>hl.encoder (string) can be 'html' (html escapes content) or 'simple' (no escaping). + * <li>hl.score.k1 (float) specifies bm25 scoring parameter 'k1' + * <li>hl.score.b (float) specifies bm25 scoring parameter 'b' + * <li>hl.score.pivot (float) specifies bm25 scoring parameter 'avgdl' + * <li>hl.bs.type (string) specifies how to divide text into passages: [SENTENCE, LINE, WORD, CHAR, WHOLE] + * <li>hl.bs.language (string) specifies language code for BreakIterator. default is empty string (root locale) + * <li>hl.bs.country (string) specifies country code for BreakIterator. default is empty string (root locale) + * <li>hl.bs.variant (string) specifies country code for BreakIterator. default is empty string (root locale) + * <li>hl.maxAnalyzedChars specifies how many characters at most will be processed in a document for any one field. + * <li>hl.highlightMultiTerm enables highlighting for range/wildcard/fuzzy/prefix queries at some cost. + * <li>hl.usePhraseHighlighter (bool) enables highlighting phrases and some other queries strictly at some cost.</li> — End diff – We know this is actually faster. So I think we can remove the reference here. But I think we forgot it in the list above to indicate it's true by default.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/107#discussion_r87740550

          — Diff: solr/core/src/test/org/apache/solr/highlight/TestUnifiedSolrHighlighter.java —
          @@ -0,0 +1,222 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.solr.highlight;
          +
          +import org.apache.solr.SolrTestCaseJ4;
          +import org.apache.solr.handler.component.HighlightComponent;
          +import org.apache.solr.schema.IndexSchema;
          +import org.junit.BeforeClass;
          +import org.junit.Ignore;
          +
          +/** simple tests for PostingsSolrHighlighter */
          +public class TestUnifiedSolrHighlighter extends SolrTestCaseJ4 {
          +
          + @BeforeClass
          + public static void beforeClass() throws Exception

          { + initCore("solrconfig-unifiedhighlight.xml", "schema-unifiedhighlight.xml"); + + // test our config is sane, just to be sure: + + // postingshighlighter should be used + SolrHighlighter highlighter = HighlightComponent.getHighlighter(h.getCore()); + assertTrue("wrong highlighter: " + highlighter.getClass(), highlighter instanceof UnifiedSolrHighlighter); + + // 'text' and 'text3' should have offsets, 'text2' should not + IndexSchema schema = h.getCore().getLatestSchema(); + assertTrue(schema.getField("text").storeOffsetsWithPositions()); + assertTrue(schema.getField("text3").storeOffsetsWithPositions()); + assertFalse(schema.getField("text2").storeOffsetsWithPositions()); + }

          +
          + @Override
          + public void setUp() throws Exception

          { + super.setUp(); + clearIndex(); + assertU(adoc("text", "document one", "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(adoc("text", "second document", "text2", "second document", "text3", "crappier document", "id", "102")); + assertU(commit()); + }

          +
          + public void testSimple()

          { + assertQ("simplest test", + req("q", "text:document", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'"); + }

          +
          + public void testMultipleSnippetsReturned()

          { + clearIndex(); + assertU(adoc("text", "Document snippet one. Intermediate sentence. Document snippet two.", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multiple snippets test", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.snippets", "2", "hl.bs.type", "SENTENCE"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Document</em> snippet one. '", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[2]='<em>Document</em> snippet two.'"); + }

          +
          + public void testStrictPhrasesEnabledByDefault()

          { + clearIndex(); + assertU(adoc("text", "Strict phrases should be enabled for phrases", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("strict phrase handling", + req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be enabled for phrases'"); + }

          +
          + public void testStrictPhrasesCanBeDisabled()

          { + clearIndex(); + assertU(adoc("text", "Strict phrases should be disabled for phrases", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("strict phrase handling", + req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true", "hl.usePhraseHighlighter", "false"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be disabled for <em>phrases</em>'"); + }

          +
          + public void testMultiTermQueryEnabledByDefault()

          { + clearIndex(); + assertU(adoc("text", "Aviary Avenue document", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multi term query handling", + req("q", "text:av*", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Aviary</em> <em>Avenue</em> document'"); + }

          +
          + public void testMultiTermQueryCanBeDisabled()

          { + clearIndex(); + assertU(adoc("text", "Aviary Avenue document", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multi term query handling", + req("q", "text:av*", "sort", "id asc", "hl", "true", "hl.highlightMultiTerm", "false"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=0"); + }

          +
          + public void testPagination()

          { + assertQ("pagination test", + req("q", "text:document", "sort", "id asc", "hl", "true", "rows", "1", "start", "1"), + "count(//lst[@name='highlighting']/*)=1", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'"); + }

          +
          + public void testEmptySnippet()

          { + assertQ("null snippet test", + req("q", "text:one OR *:*", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'", + "count(//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/*)=0"); + }

          +
          + public void testDefaultSummary()

          { + assertQ("null snippet test", + req("q", "text:one OR *:*", "sort", "id asc", "hl", "true", "hl.defaultSummary", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second document'"); + }

          +
          + public void testDifferentField()

          { + assertQ("highlighting text3", + req("q", "text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text3"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'"); + }

          +
          + public void testTwoFields()

          { + assertQ("highlighting text and text3", + req("q", "text:document text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text,text3"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'"); + }

          +
          + //todo: need to configure field that is not at least stored, hence no analysis
          + //otherwise, this highlighter is resilient
          + @Ignore
          + public void testMisconfiguredField() {
          + ignoreException("was indexed without offsets");
          + try

          { + assertQ("should fail, has no offsets", + req("q", "text2:document", "sort", "id asc", "hl", "true", "hl.fl", "text2")); + fail(); + }

          catch (Exception expected)

          { + // expected + }

          + resetExceptionIgnores();
          + }
          +
          + public void testTags()

          { + assertQ("different pre/post tags", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.tag.pre", "[", "hl.tag.post", "]"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='[document] one'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second [document]'"); + }

          +
          + public void testTagsPerField()

          { + assertQ("highlighting text and text3", + req("q", "text:document text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text,text3", "f.text3.hl.tag.pre", "[", "f.text3.hl.tag.post", "]"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy [document]'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier [document]'"); + }

          +
          + public void testBreakIterator()

          { + assertQ("different breakiterator", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.bs.type", "WORD"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='<em>document</em>'"); + }

          +
          + public void testBreakIterator2()

          { + assertU(adoc("text", "Document one has a first sentence. Document two has a second sentence.", "id", "103")); + assertU(commit()); + assertQ("different breakiterator", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.bs.type", "WHOLE"), + "//lst[@name='highlighting']/lst[@name='103']/arr[@name='text']/str='<em>Document</em> one has a first sentence. <em>Document</em> two has a second sentence.'"); + }

          +
          + public void testEncoder()

          { + assertU(adoc("text", "Document one has a first <i>sentence</i>.", "id", "103")); + assertU(commit()); + assertQ("html escaped", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.encoder", "html"), + "//lst[@name='highlighting']/lst[@name='103']/arr[@name='text']/str='<em>Document</em> one has a first <i>sentence</i>.'"); + }

          +
          + public void testWildcard() {
          — End diff –

          This test is obsoleted by ones you added above.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/107#discussion_r87740550 — Diff: solr/core/src/test/org/apache/solr/highlight/TestUnifiedSolrHighlighter.java — @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.highlight; + +import org.apache.solr.SolrTestCaseJ4; +import org.apache.solr.handler.component.HighlightComponent; +import org.apache.solr.schema.IndexSchema; +import org.junit.BeforeClass; +import org.junit.Ignore; + +/** simple tests for PostingsSolrHighlighter */ +public class TestUnifiedSolrHighlighter extends SolrTestCaseJ4 { + + @BeforeClass + public static void beforeClass() throws Exception { + initCore("solrconfig-unifiedhighlight.xml", "schema-unifiedhighlight.xml"); + + // test our config is sane, just to be sure: + + // postingshighlighter should be used + SolrHighlighter highlighter = HighlightComponent.getHighlighter(h.getCore()); + assertTrue("wrong highlighter: " + highlighter.getClass(), highlighter instanceof UnifiedSolrHighlighter); + + // 'text' and 'text3' should have offsets, 'text2' should not + IndexSchema schema = h.getCore().getLatestSchema(); + assertTrue(schema.getField("text").storeOffsetsWithPositions()); + assertTrue(schema.getField("text3").storeOffsetsWithPositions()); + assertFalse(schema.getField("text2").storeOffsetsWithPositions()); + } + + @Override + public void setUp() throws Exception { + super.setUp(); + clearIndex(); + assertU(adoc("text", "document one", "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(adoc("text", "second document", "text2", "second document", "text3", "crappier document", "id", "102")); + assertU(commit()); + } + + public void testSimple() { + assertQ("simplest test", + req("q", "text:document", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'"); + } + + public void testMultipleSnippetsReturned() { + clearIndex(); + assertU(adoc("text", "Document snippet one. Intermediate sentence. Document snippet two.", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multiple snippets test", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.snippets", "2", "hl.bs.type", "SENTENCE"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Document</em> snippet one. '", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[2]='<em>Document</em> snippet two.'"); + } + + public void testStrictPhrasesEnabledByDefault() { + clearIndex(); + assertU(adoc("text", "Strict phrases should be enabled for phrases", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("strict phrase handling", + req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be enabled for phrases'"); + } + + public void testStrictPhrasesCanBeDisabled() { + clearIndex(); + assertU(adoc("text", "Strict phrases should be disabled for phrases", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("strict phrase handling", + req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true", "hl.usePhraseHighlighter", "false"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be disabled for <em>phrases</em>'"); + } + + public void testMultiTermQueryEnabledByDefault() { + clearIndex(); + assertU(adoc("text", "Aviary Avenue document", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multi term query handling", + req("q", "text:av*", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Aviary</em> <em>Avenue</em> document'"); + } + + public void testMultiTermQueryCanBeDisabled() { + clearIndex(); + assertU(adoc("text", "Aviary Avenue document", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multi term query handling", + req("q", "text:av*", "sort", "id asc", "hl", "true", "hl.highlightMultiTerm", "false"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=0"); + } + + public void testPagination() { + assertQ("pagination test", + req("q", "text:document", "sort", "id asc", "hl", "true", "rows", "1", "start", "1"), + "count(//lst[@name='highlighting']/*)=1", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'"); + } + + public void testEmptySnippet() { + assertQ("null snippet test", + req("q", "text:one OR *:*", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'", + "count(//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/*)=0"); + } + + public void testDefaultSummary() { + assertQ("null snippet test", + req("q", "text:one OR *:*", "sort", "id asc", "hl", "true", "hl.defaultSummary", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second document'"); + } + + public void testDifferentField() { + assertQ("highlighting text3", + req("q", "text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text3"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'"); + } + + public void testTwoFields() { + assertQ("highlighting text and text3", + req("q", "text:document text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text,text3"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'"); + } + + //todo: need to configure field that is not at least stored, hence no analysis + //otherwise, this highlighter is resilient + @Ignore + public void testMisconfiguredField() { + ignoreException("was indexed without offsets"); + try { + assertQ("should fail, has no offsets", + req("q", "text2:document", "sort", "id asc", "hl", "true", "hl.fl", "text2")); + fail(); + } catch (Exception expected) { + // expected + } + resetExceptionIgnores(); + } + + public void testTags() { + assertQ("different pre/post tags", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.tag.pre", "[", "hl.tag.post", "]"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='[document] one'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second [document]'"); + } + + public void testTagsPerField() { + assertQ("highlighting text and text3", + req("q", "text:document text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text,text3", "f.text3.hl.tag.pre", "[", "f.text3.hl.tag.post", "]"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy [document]'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier [document]'"); + } + + public void testBreakIterator() { + assertQ("different breakiterator", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.bs.type", "WORD"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='<em>document</em>'"); + } + + public void testBreakIterator2() { + assertU(adoc("text", "Document one has a first sentence. Document two has a second sentence.", "id", "103")); + assertU(commit()); + assertQ("different breakiterator", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.bs.type", "WHOLE"), + "//lst[@name='highlighting']/lst[@name='103']/arr[@name='text']/str='<em>Document</em> one has a first sentence. <em>Document</em> two has a second sentence.'"); + } + + public void testEncoder() { + assertU(adoc("text", "Document one has a first <i>sentence</i>.", "id", "103")); + assertU(commit()); + assertQ("html escaped", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.encoder", "html"), + "//lst[@name='highlighting']/lst[@name='103']/arr[@name='text']/str='<em>Document</em> one has a first <i>sentence</i>.'"); + } + + public void testWildcard() { — End diff – This test is obsoleted by ones you added above.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/107#discussion_r87740728

          — Diff: solr/core/src/test/org/apache/solr/highlight/TestUnifiedSolrHighlighter.java —
          @@ -0,0 +1,222 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.solr.highlight;
          +
          +import org.apache.solr.SolrTestCaseJ4;
          +import org.apache.solr.handler.component.HighlightComponent;
          +import org.apache.solr.schema.IndexSchema;
          +import org.junit.BeforeClass;
          +import org.junit.Ignore;
          +
          +/** simple tests for PostingsSolrHighlighter */
          — End diff –

          obsolete reference to PostingsSolrHighlighter

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/107#discussion_r87740728 — Diff: solr/core/src/test/org/apache/solr/highlight/TestUnifiedSolrHighlighter.java — @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.highlight; + +import org.apache.solr.SolrTestCaseJ4; +import org.apache.solr.handler.component.HighlightComponent; +import org.apache.solr.schema.IndexSchema; +import org.junit.BeforeClass; +import org.junit.Ignore; + +/** simple tests for PostingsSolrHighlighter */ — End diff – obsolete reference to PostingsSolrHighlighter
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/107#discussion_r87738726

          — Diff: solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java —
          @@ -0,0 +1,366 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.solr.highlight;
          +
          +import java.io.IOException;
          +import java.text.BreakIterator;
          +import java.util.Collections;
          +import java.util.List;
          +import java.util.Locale;
          +import java.util.Map;
          +import java.util.Set;
          +
          +import org.apache.lucene.document.Document;
          +import org.apache.lucene.search.DocIdSetIterator;
          +import org.apache.lucene.search.Query;
          +import org.apache.lucene.search.postingshighlight.WholeBreakIterator;
          +import org.apache.lucene.search.uhighlight.DefaultPassageFormatter;
          +import org.apache.lucene.search.uhighlight.PassageFormatter;
          +import org.apache.lucene.search.uhighlight.PassageScorer;
          +import org.apache.lucene.search.uhighlight.UnifiedHighlighter;
          +import org.apache.solr.common.params.HighlightParams;
          +import org.apache.solr.common.params.SolrParams;
          +import org.apache.solr.common.util.NamedList;
          +import org.apache.solr.common.util.SimpleOrderedMap;
          +import org.apache.solr.core.PluginInfo;
          +import org.apache.solr.request.SolrQueryRequest;
          +import org.apache.solr.request.SolrRequestInfo;
          +import org.apache.solr.schema.IndexSchema;
          +import org.apache.solr.schema.SchemaField;
          +import org.apache.solr.search.DocIterator;
          +import org.apache.solr.search.DocList;
          +import org.apache.solr.search.SolrIndexSearcher;
          +import org.apache.solr.util.RTimerTree;
          +import org.apache.solr.util.plugin.PluginInfoInitialized;
          +
          +/*
          + * TODO: The HighlightComponent should not call rewrite on the query; it should be up to the
          + * SolrHighlighter to do if needed. Furthermore this arrangement is odd – why are these abstractions separate?
          + */
          +
          +/**
          + * Highlighter impl that uses

          {@link UnifiedHighlighter}

          + * <p>
          + * Example configuration with default values:
          + * <pre class="prettyprint">
          + * <requestHandler name="standard" class="solr.StandardRequestHandler">
          + * <lst name="defaults">
          + * <int name="hl.snippets">1</int>
          + * <str name="hl.tag.pre">&lt;em&gt;</str>
          + * <str name="hl.tag.post">&lt;/em&gt;</str>
          + * <str name="hl.tag.ellipsis">... </str>
          + * <bool name="hl.defaultSummary">true</bool>
          + * <str name="hl.encoder">simple</str>
          + * <float name="hl.score.k1">1.2</float>
          + * <float name="hl.score.b">0.75</float>
          + * <float name="hl.score.pivot">87</float>
          + * <str name="hl.bs.language"></str>
          + * <str name="hl.bs.country"></str>
          + * <str name="hl.bs.variant"></str>
          + * <str name="hl.bs.type">SENTENCE</str>
          + * <int name="hl.maxAnalyzedChars">10000</int>
          + * <bool name="hl.highlightMultiTerm">true</bool>
          + * </lst>
          + * </requestHandler>
          + * </pre>
          + * ...
          + * <pre class="prettyprint">
          + * <searchComponent class="solr.HighlightComponent" name="highlight">
          + * <highlighting class="org.apache.solr.highlight.UnifiedSolrHighlighter"/>
          + * </searchComponent>
          + * </pre>
          + * <p>
          + * Notes:
          + * <ul>
          + * <li>hl.q (string) can specify the query
          + * <li>hl.fl (string) specifies the field list.
          + * <li>hl.snippets (int) specifies how many snippets to return.
          + * <li>hl.tag.pre (string) specifies text which appears before a highlighted term.
          + * <li>hl.tag.post (string) specifies text which appears after a highlighted term.
          + * <li>hl.tag.ellipsis (string) specifies text which joins non-adjacent passages. The default is to retain each
          + * value in a list without joining them.
          + * <li>hl.defaultSummary (bool) specifies if a field should have a default summary of the leading text.
          + * <li>hl.encoder (string) can be 'html' (html escapes content) or 'simple' (no escaping).
          + * <li>hl.score.k1 (float) specifies bm25 scoring parameter 'k1'
          + * <li>hl.score.b (float) specifies bm25 scoring parameter 'b'
          + * <li>hl.score.pivot (float) specifies bm25 scoring parameter 'avgdl'
          + * <li>hl.bs.type (string) specifies how to divide text into passages: [SENTENCE, LINE, WORD, CHAR, WHOLE]
          + * <li>hl.bs.language (string) specifies language code for BreakIterator. default is empty string (root locale)
          + * <li>hl.bs.country (string) specifies country code for BreakIterator. default is empty string (root locale)
          + * <li>hl.bs.variant (string) specifies country code for BreakIterator. default is empty string (root locale)
          + * <li>hl.maxAnalyzedChars specifies how many characters at most will be processed in a document for any one field.
          + * <li>hl.highlightMultiTerm enables highlighting for range/wildcard/fuzzy/prefix queries at some cost.
          + * <li>hl.usePhraseHighlighter (bool) enables highlighting phrases and some other queries strictly at some cost.</li>
          + * </ul>
          + * TODO add hl.method, hl.cacheFieldValCharsThreshold
          + *
          + * @lucene.experimental
          + */
          +public class UnifiedSolrHighlighter extends SolrHighlighter implements PluginInfoInitialized {
          +
          + protected static final String SNIPPET_SEPARATOR = "\u0000";
          + private static final String[] ZERO_LEN_STR_ARRAY = new String[0];
          +
          + //TODO move to Solr HighlightParams
          — End diff –

          These TODOs should be addressed (and note corresponding docs TODO on line 109).

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/107#discussion_r87738726 — Diff: solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java — @@ -0,0 +1,366 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.highlight; + +import java.io.IOException; +import java.text.BreakIterator; +import java.util.Collections; +import java.util.List; +import java.util.Locale; +import java.util.Map; +import java.util.Set; + +import org.apache.lucene.document.Document; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.postingshighlight.WholeBreakIterator; +import org.apache.lucene.search.uhighlight.DefaultPassageFormatter; +import org.apache.lucene.search.uhighlight.PassageFormatter; +import org.apache.lucene.search.uhighlight.PassageScorer; +import org.apache.lucene.search.uhighlight.UnifiedHighlighter; +import org.apache.solr.common.params.HighlightParams; +import org.apache.solr.common.params.SolrParams; +import org.apache.solr.common.util.NamedList; +import org.apache.solr.common.util.SimpleOrderedMap; +import org.apache.solr.core.PluginInfo; +import org.apache.solr.request.SolrQueryRequest; +import org.apache.solr.request.SolrRequestInfo; +import org.apache.solr.schema.IndexSchema; +import org.apache.solr.schema.SchemaField; +import org.apache.solr.search.DocIterator; +import org.apache.solr.search.DocList; +import org.apache.solr.search.SolrIndexSearcher; +import org.apache.solr.util.RTimerTree; +import org.apache.solr.util.plugin.PluginInfoInitialized; + +/* + * TODO: The HighlightComponent should not call rewrite on the query; it should be up to the + * SolrHighlighter to do if needed. Furthermore this arrangement is odd – why are these abstractions separate? + */ + +/** + * Highlighter impl that uses {@link UnifiedHighlighter} + * <p> + * Example configuration with default values: + * <pre class="prettyprint"> + * <requestHandler name="standard" class="solr.StandardRequestHandler"> + * <lst name="defaults"> + * <int name="hl.snippets">1</int> + * <str name="hl.tag.pre">&lt;em&gt;</str> + * <str name="hl.tag.post">&lt;/em&gt;</str> + * <str name="hl.tag.ellipsis">... </str> + * <bool name="hl.defaultSummary">true</bool> + * <str name="hl.encoder">simple</str> + * <float name="hl.score.k1">1.2</float> + * <float name="hl.score.b">0.75</float> + * <float name="hl.score.pivot">87</float> + * <str name="hl.bs.language"></str> + * <str name="hl.bs.country"></str> + * <str name="hl.bs.variant"></str> + * <str name="hl.bs.type">SENTENCE</str> + * <int name="hl.maxAnalyzedChars">10000</int> + * <bool name="hl.highlightMultiTerm">true</bool> + * </lst> + * </requestHandler> + * </pre> + * ... + * <pre class="prettyprint"> + * <searchComponent class="solr.HighlightComponent" name="highlight"> + * <highlighting class="org.apache.solr.highlight.UnifiedSolrHighlighter"/> + * </searchComponent> + * </pre> + * <p> + * Notes: + * <ul> + * <li>hl.q (string) can specify the query + * <li>hl.fl (string) specifies the field list. + * <li>hl.snippets (int) specifies how many snippets to return. + * <li>hl.tag.pre (string) specifies text which appears before a highlighted term. + * <li>hl.tag.post (string) specifies text which appears after a highlighted term. + * <li>hl.tag.ellipsis (string) specifies text which joins non-adjacent passages. The default is to retain each + * value in a list without joining them. + * <li>hl.defaultSummary (bool) specifies if a field should have a default summary of the leading text. + * <li>hl.encoder (string) can be 'html' (html escapes content) or 'simple' (no escaping). + * <li>hl.score.k1 (float) specifies bm25 scoring parameter 'k1' + * <li>hl.score.b (float) specifies bm25 scoring parameter 'b' + * <li>hl.score.pivot (float) specifies bm25 scoring parameter 'avgdl' + * <li>hl.bs.type (string) specifies how to divide text into passages: [SENTENCE, LINE, WORD, CHAR, WHOLE] + * <li>hl.bs.language (string) specifies language code for BreakIterator. default is empty string (root locale) + * <li>hl.bs.country (string) specifies country code for BreakIterator. default is empty string (root locale) + * <li>hl.bs.variant (string) specifies country code for BreakIterator. default is empty string (root locale) + * <li>hl.maxAnalyzedChars specifies how many characters at most will be processed in a document for any one field. + * <li>hl.highlightMultiTerm enables highlighting for range/wildcard/fuzzy/prefix queries at some cost. + * <li>hl.usePhraseHighlighter (bool) enables highlighting phrases and some other queries strictly at some cost.</li> + * </ul> + * TODO add hl.method, hl.cacheFieldValCharsThreshold + * + * @lucene.experimental + */ +public class UnifiedSolrHighlighter extends SolrHighlighter implements PluginInfoInitialized { + + protected static final String SNIPPET_SEPARATOR = "\u0000"; + private static final String[] ZERO_LEN_STR_ARRAY = new String [0] ; + + //TODO move to Solr HighlightParams — End diff – These TODOs should be addressed (and note corresponding docs TODO on line 109).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/107#discussion_r87740491

          — Diff: solr/core/src/test/org/apache/solr/highlight/TestUnifiedSolrHighlighter.java —
          @@ -0,0 +1,222 @@
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +package org.apache.solr.highlight;
          +
          +import org.apache.solr.SolrTestCaseJ4;
          +import org.apache.solr.handler.component.HighlightComponent;
          +import org.apache.solr.schema.IndexSchema;
          +import org.junit.BeforeClass;
          +import org.junit.Ignore;
          +
          +/** simple tests for PostingsSolrHighlighter */
          +public class TestUnifiedSolrHighlighter extends SolrTestCaseJ4 {
          +
          + @BeforeClass
          + public static void beforeClass() throws Exception

          { + initCore("solrconfig-unifiedhighlight.xml", "schema-unifiedhighlight.xml"); + + // test our config is sane, just to be sure: + + // postingshighlighter should be used + SolrHighlighter highlighter = HighlightComponent.getHighlighter(h.getCore()); + assertTrue("wrong highlighter: " + highlighter.getClass(), highlighter instanceof UnifiedSolrHighlighter); + + // 'text' and 'text3' should have offsets, 'text2' should not + IndexSchema schema = h.getCore().getLatestSchema(); + assertTrue(schema.getField("text").storeOffsetsWithPositions()); + assertTrue(schema.getField("text3").storeOffsetsWithPositions()); + assertFalse(schema.getField("text2").storeOffsetsWithPositions()); + }

          +
          + @Override
          + public void setUp() throws Exception

          { + super.setUp(); + clearIndex(); + assertU(adoc("text", "document one", "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(adoc("text", "second document", "text2", "second document", "text3", "crappier document", "id", "102")); + assertU(commit()); + }

          +
          + public void testSimple()

          { + assertQ("simplest test", + req("q", "text:document", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'"); + }

          +
          + public void testMultipleSnippetsReturned()

          { + clearIndex(); + assertU(adoc("text", "Document snippet one. Intermediate sentence. Document snippet two.", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multiple snippets test", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.snippets", "2", "hl.bs.type", "SENTENCE"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Document</em> snippet one. '", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[2]='<em>Document</em> snippet two.'"); + }

          +
          + public void testStrictPhrasesEnabledByDefault()

          { + clearIndex(); + assertU(adoc("text", "Strict phrases should be enabled for phrases", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("strict phrase handling", + req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be enabled for phrases'"); + }

          +
          + public void testStrictPhrasesCanBeDisabled()

          { + clearIndex(); + assertU(adoc("text", "Strict phrases should be disabled for phrases", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("strict phrase handling", + req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true", "hl.usePhraseHighlighter", "false"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be disabled for <em>phrases</em>'"); + }

          +
          + public void testMultiTermQueryEnabledByDefault()

          { + clearIndex(); + assertU(adoc("text", "Aviary Avenue document", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multi term query handling", + req("q", "text:av*", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Aviary</em> <em>Avenue</em> document'"); + }

          +
          + public void testMultiTermQueryCanBeDisabled()

          { + clearIndex(); + assertU(adoc("text", "Aviary Avenue document", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multi term query handling", + req("q", "text:av*", "sort", "id asc", "hl", "true", "hl.highlightMultiTerm", "false"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=0"); + }

          +
          + public void testPagination()

          { + assertQ("pagination test", + req("q", "text:document", "sort", "id asc", "hl", "true", "rows", "1", "start", "1"), + "count(//lst[@name='highlighting']/*)=1", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'"); + }

          +
          + public void testEmptySnippet()

          { + assertQ("null snippet test", + req("q", "text:one OR *:*", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'", + "count(//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/*)=0"); + }

          +
          + public void testDefaultSummary()

          { + assertQ("null snippet test", + req("q", "text:one OR *:*", "sort", "id asc", "hl", "true", "hl.defaultSummary", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second document'"); + }

          +
          + public void testDifferentField()

          { + assertQ("highlighting text3", + req("q", "text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text3"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'"); + }

          +
          + public void testTwoFields()

          { + assertQ("highlighting text and text3", + req("q", "text:document text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text,text3"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'"); + }

          +
          + //todo: need to configure field that is not at least stored, hence no analysis
          + //otherwise, this highlighter is resilient
          + @Ignore
          — End diff –

          It seems this test should be dropped. Indeed, this highlighter is resilient; it just needs to be stored.

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/107#discussion_r87740491 — Diff: solr/core/src/test/org/apache/solr/highlight/TestUnifiedSolrHighlighter.java — @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.highlight; + +import org.apache.solr.SolrTestCaseJ4; +import org.apache.solr.handler.component.HighlightComponent; +import org.apache.solr.schema.IndexSchema; +import org.junit.BeforeClass; +import org.junit.Ignore; + +/** simple tests for PostingsSolrHighlighter */ +public class TestUnifiedSolrHighlighter extends SolrTestCaseJ4 { + + @BeforeClass + public static void beforeClass() throws Exception { + initCore("solrconfig-unifiedhighlight.xml", "schema-unifiedhighlight.xml"); + + // test our config is sane, just to be sure: + + // postingshighlighter should be used + SolrHighlighter highlighter = HighlightComponent.getHighlighter(h.getCore()); + assertTrue("wrong highlighter: " + highlighter.getClass(), highlighter instanceof UnifiedSolrHighlighter); + + // 'text' and 'text3' should have offsets, 'text2' should not + IndexSchema schema = h.getCore().getLatestSchema(); + assertTrue(schema.getField("text").storeOffsetsWithPositions()); + assertTrue(schema.getField("text3").storeOffsetsWithPositions()); + assertFalse(schema.getField("text2").storeOffsetsWithPositions()); + } + + @Override + public void setUp() throws Exception { + super.setUp(); + clearIndex(); + assertU(adoc("text", "document one", "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(adoc("text", "second document", "text2", "second document", "text3", "crappier document", "id", "102")); + assertU(commit()); + } + + public void testSimple() { + assertQ("simplest test", + req("q", "text:document", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'"); + } + + public void testMultipleSnippetsReturned() { + clearIndex(); + assertU(adoc("text", "Document snippet one. Intermediate sentence. Document snippet two.", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multiple snippets test", + req("q", "text:document", "sort", "id asc", "hl", "true", "hl.snippets", "2", "hl.bs.type", "SENTENCE"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Document</em> snippet one. '", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[2]='<em>Document</em> snippet two.'"); + } + + public void testStrictPhrasesEnabledByDefault() { + clearIndex(); + assertU(adoc("text", "Strict phrases should be enabled for phrases", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("strict phrase handling", + req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be enabled for phrases'"); + } + + public void testStrictPhrasesCanBeDisabled() { + clearIndex(); + assertU(adoc("text", "Strict phrases should be disabled for phrases", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("strict phrase handling", + req("q", "text:\"strict phrases\"", "sort", "id asc", "hl", "true", "hl.usePhraseHighlighter", "false"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Strict</em> <em>phrases</em> should be disabled for <em>phrases</em>'"); + } + + public void testMultiTermQueryEnabledByDefault() { + clearIndex(); + assertU(adoc("text", "Aviary Avenue document", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multi term query handling", + req("q", "text:av*", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=1", + "//lst[@name='highlighting']/lst[@name='101']/arr/str[1]='<em>Aviary</em> <em>Avenue</em> document'"); + } + + public void testMultiTermQueryCanBeDisabled() { + clearIndex(); + assertU(adoc("text", "Aviary Avenue document", + "text2", "document one", "text3", "crappy document", "id", "101")); + assertU(commit()); + assertQ("multi term query handling", + req("q", "text:av*", "sort", "id asc", "hl", "true", "hl.highlightMultiTerm", "false"), + "count(//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/*)=0"); + } + + public void testPagination() { + assertQ("pagination test", + req("q", "text:document", "sort", "id asc", "hl", "true", "rows", "1", "start", "1"), + "count(//lst[@name='highlighting']/*)=1", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'"); + } + + public void testEmptySnippet() { + assertQ("null snippet test", + req("q", "text:one OR *:*", "sort", "id asc", "hl", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'", + "count(//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/*)=0"); + } + + public void testDefaultSummary() { + assertQ("null snippet test", + req("q", "text:one OR *:*", "sort", "id asc", "hl", "true", "hl.defaultSummary", "true"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='document <em>one</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second document'"); + } + + public void testDifferentField() { + assertQ("highlighting text3", + req("q", "text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text3"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'"); + } + + public void testTwoFields() { + assertQ("highlighting text and text3", + req("q", "text:document text3:document", "sort", "id asc", "hl", "true", "hl.fl", "text,text3"), + "count(//lst[@name='highlighting']/*)=2", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text']/str='<em>document</em> one'", + "//lst[@name='highlighting']/lst[@name='101']/arr[@name='text3']/str='crappy <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text']/str='second <em>document</em>'", + "//lst[@name='highlighting']/lst[@name='102']/arr[@name='text3']/str='crappier <em>document</em>'"); + } + + //todo: need to configure field that is not at least stored, hence no analysis + //otherwise, this highlighter is resilient + @Ignore — End diff – It seems this test should be dropped. Indeed, this highlighter is resilient; it just needs to be stored.
          Hide
          dsmiley David Smiley added a comment -

          (see github comments)
          Looking good Tim; just some super minor stuff.

          It's unfortunate that users need to go into their solrconfig to enable this highlighter. What if there was a boolean param hl.useUnifiedHighlighter that was evaluated at the very beginning of DefaultSolrHighlighter.doHighlighting that delegated to it? I know from a code perspective it's very hacky but consider the user experience. In Solr we're trying to push towards avoiding the need to touch configs. Fortunately, this highlighter is completely configured from params (not so for SH & FVH)... except for enabling it in the first place (which we can change here). The DSH after all already handles FVH. In a future issue for 7.0 I'd like to propose some re-arrangement to the Solr highlighting code to simplify & clean up things.

          Show
          dsmiley David Smiley added a comment - (see github comments) Looking good Tim; just some super minor stuff. It's unfortunate that users need to go into their solrconfig to enable this highlighter. What if there was a boolean param hl.useUnifiedHighlighter that was evaluated at the very beginning of DefaultSolrHighlighter.doHighlighting that delegated to it? I know from a code perspective it's very hacky but consider the user experience. In Solr we're trying to push towards avoiding the need to touch configs. Fortunately, this highlighter is completely configured from params (not so for SH & FVH)... except for enabling it in the first place (which we can change here). The DSH after all already handles FVH. In a future issue for 7.0 I'd like to propose some re-arrangement to the Solr highlighting code to simplify & clean up things.
          Hide
          Timothy055 Timothy M. Rodriguez added a comment -

          Thanks for catching those things. I've fixed them and pushed to the pr.

          Regarding the hl.useUnifiedHighlighter I'm actually very in favor of that idea, but perhaps that logic would be better in the highlight component? In that way the actual highlighters would be more like the facet params that help tweak with algorithm gets used. I agree that we really shouldn't have to "configure" the highlighters. Perhaps that should be a separate issue though more in line with the the other changes mentioned?

          Show
          Timothy055 Timothy M. Rodriguez added a comment - Thanks for catching those things. I've fixed them and pushed to the pr. Regarding the hl.useUnifiedHighlighter I'm actually very in favor of that idea, but perhaps that logic would be better in the highlight component? In that way the actual highlighters would be more like the facet params that help tweak with algorithm gets used. I agree that we really shouldn't have to "configure" the highlighters. Perhaps that should be a separate issue though more in line with the the other changes mentioned?
          Hide
          dsmiley David Smiley added a comment -

          Sure, put the proposed logic into the HighlightComponent.

          It's definitely a separate issue for the Standard & FVH to be fully configurable via params.

          Show
          dsmiley David Smiley added a comment - Sure, put the proposed logic into the HighlightComponent. It's definitely a separate issue for the Standard & FVH to be fully configurable via params.
          Hide
          dsmiley David Smiley added a comment -

          After you add hl.useUnifiedHighlighter, I think the tests should be updated to go about it his way instead of custom search component because people probably won't bother to explicitly change the search component given this approach works without fuss.

          Maybe hl.method should be hl.offsetSource so as to not suggest you're picking the highlighter implementation overall?

          Show
          dsmiley David Smiley added a comment - After you add hl.useUnifiedHighlighter, I think the tests should be updated to go about it his way instead of custom search component because people probably won't bother to explicitly change the search component given this approach works without fuss. Maybe hl.method should be hl.offsetSource so as to not suggest you're picking the highlighter implementation overall?
          Hide
          dsmiley David Smiley added a comment -

          Also I'm on the fence if we should support hl.simple.pre (HighlightParams.SIMPLE_PRE) and corresponding -post. We probably should for a better user experience. But support those as fallbacks, as I think the hl.tag.pre is a better name.

          Show
          dsmiley David Smiley added a comment - Also I'm on the fence if we should support hl.simple.pre ( HighlightParams.SIMPLE_PRE ) and corresponding -post. We probably should for a better user experience. But support those as fallbacks, as I think the hl.tag.pre is a better name.
          Hide
          Timothy055 Timothy M. Rodriguez added a comment -

          I'm okay with hl.tag.pre/post, but it may not always be a tag. Perhaps something like hl.pre.marker? or hl.pre.sigil?

          Show
          Timothy055 Timothy M. Rodriguez added a comment - I'm okay with hl.tag.pre/post, but it may not always be a tag. Perhaps something like hl.pre.marker? or hl.pre.sigil?
          Hide
          dsmiley David Smiley added a comment -

          -1 I would hate to see new parameters when there are semantically equivalent ones already.

          Show
          dsmiley David Smiley added a comment - -1 I would hate to see new parameters when there are semantically equivalent ones already.
          Hide
          Timothy055 Timothy M. Rodriguez added a comment -

          I thought the suggestion was to use hl.tag.pre instead of hl.simple.pre?

          Show
          Timothy055 Timothy M. Rodriguez added a comment - I thought the suggestion was to use hl.tag.pre instead of hl.simple.pre?
          Hide
          dsmiley David Smiley added a comment -

          I suggested to support both.

          Show
          dsmiley David Smiley added a comment - I suggested to support both .
          Hide
          Timothy055 Timothy M. Rodriguez added a comment -

          I was suggesting instead of hl.tag.pre, but realized that's used too. No sense adding a third. Even though both names are not so ideal IMO

          Show
          Timothy055 Timothy M. Rodriguez added a comment - I was suggesting instead of hl.tag.pre, but realized that's used too. No sense adding a third. Even though both names are not so ideal IMO
          Hide
          Timothy055 Timothy M. Rodriguez added a comment -

          I've posted an initial commit that allows the user to override the configured highlighter based on the "hl.method" parameter.

          Two things I want to highlight:

          • The highlighter can no longer safely be statically determined using HighlightComponent.getHiglighter since a request parameter can override the pre-configured one. I've marked this usage deprecated as it affects quite a few places outside of this change. Is that okay?
          • Use of an enum for collecting all the highlight methods and giving a bit extra type safety when switching over the values in the override. I'm not sure if this is out of style and several static String fields is preferred (although I personally prefer the former).
          Show
          Timothy055 Timothy M. Rodriguez added a comment - I've posted an initial commit that allows the user to override the configured highlighter based on the "hl.method" parameter. Two things I want to highlight: The highlighter can no longer safely be statically determined using HighlightComponent.getHiglighter since a request parameter can override the pre-configured one. I've marked this usage deprecated as it affects quite a few places outside of this change. Is that okay? Use of an enum for collecting all the highlight methods and giving a bit extra type safety when switching over the values in the override. I'm not sure if this is out of style and several static String fields is preferred (although I personally prefer the former).
          Hide
          Timothy055 Timothy M. Rodriguez added a comment -

          Added a normalizeParameters method that will set tag.pre or post if simple.pre or post are set.

          Show
          Timothy055 Timothy M. Rodriguez added a comment - Added a normalizeParameters method that will set tag.pre or post if simple.pre or post are set.
          Hide
          Timothy055 Timothy M. Rodriguez added a comment -

          I've pushed changes to the review feedback. For some reason the build bot seems to have stopped tracking the changes, but I've confirmed they are on the GitHub PR. The unified highlighter is now configurable on a per request basis without even being configured as the highlight component!

          Show
          Timothy055 Timothy M. Rodriguez added a comment - I've pushed changes to the review feedback. For some reason the build bot seems to have stopped tracking the changes, but I've confirmed they are on the GitHub PR. The unified highlighter is now configurable on a per request basis without even being configured as the highlight component!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dsmiley commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/107#discussion_r89039004

          — Diff: solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java —
          @@ -184,6 +185,20 @@ public void process(ResponseBuilder rb) throws IOException {
          }
          }

          + /**
          + * Normalizes parameters between highlighters
          + */
          + private SolrParams normalizeParameters(SolrParams params) {
          — End diff –

          You've coded this such that SIMPLE_PRE overrides TAG_PRE which is not what we want I think? Furthermore, this is coded such that it only overrides at the global level which won't work for field-specific settings like `f.myfieldname.hl.tag.pre` which we'd want to examine `f.myfieldname.hl.simple.pre`. I appreciate where you were going with this, but in light of the latter point, I think you should simply modify the Solr UH adapter to lookup say "pre" like so:

          String preTag = params.getFieldParam(fieldName, HighlightParams.TAG_PRE,
          params.getFieldParam(fieldName, HighlightParams.SIMPLE_PRE, "<em>");
          );

          Show
          githubbot ASF GitHub Bot added a comment - Github user dsmiley commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/107#discussion_r89039004 — Diff: solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java — @@ -184,6 +185,20 @@ public void process(ResponseBuilder rb) throws IOException { } } + /** + * Normalizes parameters between highlighters + */ + private SolrParams normalizeParameters(SolrParams params) { — End diff – You've coded this such that SIMPLE_PRE overrides TAG_PRE which is not what we want I think? Furthermore, this is coded such that it only overrides at the global level which won't work for field-specific settings like `f.myfieldname.hl.tag.pre` which we'd want to examine `f.myfieldname.hl.simple.pre`. I appreciate where you were going with this, but in light of the latter point, I think you should simply modify the Solr UH adapter to lookup say "pre" like so: String preTag = params.getFieldParam(fieldName, HighlightParams.TAG_PRE, params.getFieldParam(fieldName, HighlightParams.SIMPLE_PRE, "<em>"); );
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user Timothy055 commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/107#discussion_r89227253

          — Diff: solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java —
          @@ -184,6 +185,20 @@ public void process(ResponseBuilder rb) throws IOException {
          }
          }

          + /**
          + * Normalizes parameters between highlighters
          + */
          + private SolrParams normalizeParameters(SolrParams params) {
          — End diff –

          Fixed

          Show
          githubbot ASF GitHub Bot added a comment - Github user Timothy055 commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/107#discussion_r89227253 — Diff: solr/core/src/java/org/apache/solr/handler/component/HighlightComponent.java — @@ -184,6 +185,20 @@ public void process(ResponseBuilder rb) throws IOException { } } + /** + * Normalizes parameters between highlighters + */ + private SolrParams normalizeParameters(SolrParams params) { — End diff – Fixed
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user Timothy055 commented on the issue:

          https://github.com/apache/lucene-solr/pull/107

          I've made some more updates to the documentation for the pre and post parameters as well as fixed the defaulting logic.

          Show
          githubbot ASF GitHub Bot added a comment - Github user Timothy055 commented on the issue: https://github.com/apache/lucene-solr/pull/107 I've made some more updates to the documentation for the pre and post parameters as well as fixed the defaulting logic.
          Hide
          dsmiley David Smiley added a comment -

          I iterated further from your work: https://github.com/dsmiley/lucene-solr/commit/a29dae9dde1dc018bd704cefb4fd7560d4eeb9fc
          I expanded the support of hl.method to all of: original|fastVector|postings|unified. I also organized/documented HighlightParams with notes on which highlighters use which. I figure "original" is a nice name for the Highlighter.java implementation instead of "standard", as "original" will remain a good name, even if it's not necessarily the default in the future. "standard" kinda means "default".

          What do you think? Feel free to either comment here or on my commit in GitHub; whatever suits you. Once there's no further changes, I'll promptly post a final .patch file here then commit.

          Later in December I'll update the Solr Ref Guide. I might convert the PostingsHighlighter page to be about the UnifiedHighlighter, and then on this page address how the PostingsHighlighter varies and how it's invoked instead (very easy now thanks to hl.method=postings). Since the UH is a derivative of the PH, completely compatible; we might very well remove the PH in 7.0.

          Show
          dsmiley David Smiley added a comment - I iterated further from your work: https://github.com/dsmiley/lucene-solr/commit/a29dae9dde1dc018bd704cefb4fd7560d4eeb9fc I expanded the support of hl.method to all of: original|fastVector|postings|unified. I also organized/documented HighlightParams with notes on which highlighters use which. I figure "original" is a nice name for the Highlighter.java implementation instead of "standard", as "original" will remain a good name, even if it's not necessarily the default in the future. "standard" kinda means "default". What do you think? Feel free to either comment here or on my commit in GitHub; whatever suits you. Once there's no further changes, I'll promptly post a final .patch file here then commit. Later in December I'll update the Solr Ref Guide. I might convert the PostingsHighlighter page to be about the UnifiedHighlighter, and then on this page address how the PostingsHighlighter varies and how it's invoked instead (very easy now thanks to hl.method=postings). Since the UH is a derivative of the PH, completely compatible; we might very well remove the PH in 7.0.
          Hide
          Timothy055 Timothy M. Rodriguez added a comment -

          Looks great! Adding the other highlighters to method really fleshed it out. Also in favor of the change from "default" to "original". No further suggested changes other than a rename on the FASTVECTOR enum to FAST_VECTOR. Let me know if you need any help with the wiki in December. Would be glad to contribute there as well.

          Show
          Timothy055 Timothy M. Rodriguez added a comment - Looks great! Adding the other highlighters to method really fleshed it out. Also in favor of the change from "default" to "original". No further suggested changes other than a rename on the FASTVECTOR enum to FAST_VECTOR. Let me know if you need any help with the wiki in December. Would be glad to contribute there as well.
          Hide
          dsmiley David Smiley added a comment -

          Here's the patch, with FASTVECTOR -> FAST_VECTOR. Proposed CHANGES.txt will be:

          New Feature:

          SOLR-9708: Added UnifiedSolrHighlighter, a highlighter adapter for Lucene's UnifiedHighlighter. The adapter is a derivative of the PostingsSolrHighlighter, supporting mostly the same parameters with some differences.
          Introduced "hl.method" parameter which can be set to original|fastVector|postings|unified to pick the highlighter at runtime without the need to modify solrconfig from the default configuration. hl.useFastVectorHighlighter is now considered deprecated in lieu of hl.method=fastVector.

          Upgrading:

          You are encouraged to try out the UnifiedHighlighter by setting hl.method=unified and report feedback. It might become the default in 7.0. It's more efficient/faster than the other highlighters, especially compared to the original Highlighter. That said, some options aren't supported yet, notably hl.fragsize and hl.requireFieldMatch=false. It will get more features in time, especially with your input. See HighlightParams.java for a listing of highlight parameters annotated with which highlighters use them.


          I don't recall seeing this kind of note before in the upgrading section but I think it's appropriate. Nobody actually has to do anything, but it's in everyone's interest to try it out.

          BTW thanks Tim for entertaining my request to incorporate hl.method. In hindsight it deserved a separate issue.

          I'll commit tomorrow night.

          Show
          dsmiley David Smiley added a comment - Here's the patch, with FASTVECTOR -> FAST_VECTOR. Proposed CHANGES.txt will be: New Feature: SOLR-9708 : Added UnifiedSolrHighlighter, a highlighter adapter for Lucene's UnifiedHighlighter. The adapter is a derivative of the PostingsSolrHighlighter, supporting mostly the same parameters with some differences. Introduced "hl.method" parameter which can be set to original|fastVector|postings|unified to pick the highlighter at runtime without the need to modify solrconfig from the default configuration. hl.useFastVectorHighlighter is now considered deprecated in lieu of hl.method=fastVector. Upgrading: You are encouraged to try out the UnifiedHighlighter by setting hl.method=unified and report feedback. It might become the default in 7.0. It's more efficient/faster than the other highlighters, especially compared to the original Highlighter. That said, some options aren't supported yet, notably hl.fragsize and hl.requireFieldMatch=false. It will get more features in time, especially with your input. See HighlightParams.java for a listing of highlight parameters annotated with which highlighters use them. I don't recall seeing this kind of note before in the upgrading section but I think it's appropriate. Nobody actually has to do anything, but it's in everyone's interest to try it out. BTW thanks Tim for entertaining my request to incorporate hl.method. In hindsight it deserved a separate issue. I'll commit tomorrow night.
          Hide
          Timothy055 Timothy M. Rodriguez added a comment -

          Haha, no problem. It'll improve usability quite a bit to be able to dynamically invoke it per request (and the other highlighters). I'm glad it landed with the initial Solr release of the unified highlighter.

          Show
          Timothy055 Timothy M. Rodriguez added a comment - Haha, no problem. It'll improve usability quite a bit to be able to dynamically invoke it per request (and the other highlighters). I'm glad it landed with the initial Solr release of the unified highlighter.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4314c51c66de1eed0dbc4897684e79935ebfd55e in lucene-solr's branch refs/heads/master from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4314c51 ]

          SOLR-9708: Added UnifiedSolrHighlighter. Added hl.method=original|fastVector|postings|unified

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4314c51c66de1eed0dbc4897684e79935ebfd55e in lucene-solr's branch refs/heads/master from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4314c51 ] SOLR-9708 : Added UnifiedSolrHighlighter. Added hl.method=original|fastVector|postings|unified
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a089bf2827e536b8894c4181844bcf07cb6f9c79 in lucene-solr's branch refs/heads/branch_6x from David Smiley
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a089bf2 ]

          SOLR-9708: Added UnifiedSolrHighlighter. Added hl.method=original|fastVector|postings|unified

          (cherry picked from commit 4314c51)

          Show
          jira-bot ASF subversion and git services added a comment - Commit a089bf2827e536b8894c4181844bcf07cb6f9c79 in lucene-solr's branch refs/heads/branch_6x from David Smiley [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a089bf2 ] SOLR-9708 : Added UnifiedSolrHighlighter. Added hl.method=original|fastVector|postings|unified (cherry picked from commit 4314c51)
          Hide
          dsmiley David Smiley added a comment -

          As I was working on highlighter documentation... I think the default we use for hl.maxAnalyzedChars should be equal to that of the Original Highlighter, which is 51200 – certainly not less. After all, it certainly does a faster job, and this parameter is a performance oriented threshold. I see we're currently using the default in the UnifiedHighlighter which is 10000. This fits within the overarching goal of making transitioning to this highlighter straight-forward minimizing gotchas.

          Jim Ferenczi can I simply commit a 1-liner change to set this default in 6.4 or shall I file a new issue?

          Show
          dsmiley David Smiley added a comment - As I was working on highlighter documentation... I think the default we use for hl.maxAnalyzedChars should be equal to that of the Original Highlighter, which is 51200 – certainly not less. After all, it certainly does a faster job, and this parameter is a performance oriented threshold. I see we're currently using the default in the UnifiedHighlighter which is 10000. This fits within the overarching goal of making transitioning to this highlighter straight-forward minimizing gotchas. Jim Ferenczi can I simply commit a 1-liner change to set this default in 6.4 or shall I file a new issue?

            People

            • Assignee:
              dsmiley David Smiley
              Reporter:
              Timothy055 Timothy M. Rodriguez
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development