Solr
  1. Solr
  2. SOLR-937

Highlighting problem related to stemming

    Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.3
    • Fix Version/s: None
    • Component/s: highlighter
    • Labels:
      None

      Description

      Using the example data (as in "ant run-example") from the latest dev version, add the words "electronics" and "connector" to the features field of the first doc in ipod_other.xml. Now the following query:

      http://localhost:8983/solr/select/?q=electronics&hl=true&hl.fl=features+cat

      will show "electronics" highlighted in the features field but not in the cat field. If you search instead for "connector", it is highlighted in both.

      This seems like a bug to me. A possible but not entirely satisfactory work-around would be to have the cat field copied into another field which is stemmed, and use that other field for highlighting (assuming the search is on the default search field, and not on cat).

        Issue Links

          Activity

          Hide
          Erick Erickson added a comment -

          2013 Old JIRA cleanup

          Show
          Erick Erickson added a comment - 2013 Old JIRA cleanup
          Hide
          Chris Harris added a comment -

          I haven't taken the time to reproduce this particular issue, but I think the problem is not limited to stemming. Assuming you aren't specifying particular field names in your query, the problem can be summarized like this:

          Solr (at least as of 1.4) is in danger of producing weird highlights whenever the analyzer for your index's default search field differs from the analyzer for your highlight field(s). The HighlightComponent takes the Query object parsed by the QueryComponent (which was tokenized according to the default field's analyzer) and applies it unchanged to the highlighting field (which will be tokenized according to a different analyzer). The same word may be tokenized differently by the two tokenizers, with results like missing highlights.

          I'm not sure what the best solution is here. I've proposed an option that can help in some cases at SOLR-1910. Another possibility would be a new hl.useHighlightedFieldAsDefaultField highlighter option, which would create a new Query object (based on a separate analyzer) not just once at the start of highlighting, but separately for each particular field that's getting highlighted.

          A complication for either approach is whether you should do anything special to parts of the query that do specify a particular field (e.g. "features:electronics").

          Show
          Chris Harris added a comment - I haven't taken the time to reproduce this particular issue, but I think the problem is not limited to stemming. Assuming you aren't specifying particular field names in your query, the problem can be summarized like this: Solr (at least as of 1.4) is in danger of producing weird highlights whenever the analyzer for your index's default search field differs from the analyzer for your highlight field(s). The HighlightComponent takes the Query object parsed by the QueryComponent (which was tokenized according to the default field's analyzer) and applies it unchanged to the highlighting field (which will be tokenized according to a different analyzer). The same word may be tokenized differently by the two tokenizers, with results like missing highlights. I'm not sure what the best solution is here. I've proposed an option that can help in some cases at SOLR-1910 . Another possibility would be a new hl.useHighlightedFieldAsDefaultField highlighter option, which would create a new Query object (based on a separate analyzer) not just once at the start of highlighting, but separately for each particular field that's getting highlighted. A complication for either approach is whether you should do anything special to parts of the query that do specify a particular field (e.g. "features:electronics").

            People

            • Assignee:
              Unassigned
              Reporter:
              David Bowen
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development