Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8537

phrase highlighter doesn't work when searching for phrase containing some stopwords

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.10.4
    • None
    • highlighter
    • None

    Description

      When executing a phrase search containing 3 or more stopwords highlight is empty.
      Example:

      solrconfig.xml
      <?xml version="1.0" encoding="UTF-8" ?>
      <config>
        <luceneMatchVersion>LUCENE_4_10</luceneMatchVersion>
        <requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
        <requestHandler name="/select" class="solr.SearchHandler" />
        <requestHandler name="/update" class="solr.UpdateRequestHandler" />
        <requestHandler name="/analysis/field" class="solr.FieldAnalysisRequestHandler" startup="lazy"/>
      </config>
      
      schema.xml
      <?xml version="1.0" ?>
      <schema name="${solr.core.name}">
        <types>
          <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
          <fieldtype name="string"  class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
          <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer>
              <tokenizer class="solr.StandardTokenizerFactory"/>
              <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" format="snowball" enablePositionIncrements="true"/>
            </analyzer>
          </fieldType>
        </types>
        <fields>
          <field name="_version_" type="long"     indexed="true"  stored="true" />
          <field name="id" type="string" indexed="true" stored="true" multiValued="false" />
          <field name="document_text" type="text" indexed="true" stored="true" multiValued="false" />
        </fields>
        <uniqueKey>id</uniqueKey>
        <defaultSearchField>document_text</defaultSearchField>
      </schema>
      
      stopwords.txt
      c
      e
      g
      

      Load this document:

      <add>
      <doc>
      <field name="id">1</field>
      <field name="document_text">a c b d a b c d e f g h i a f g b e</field>
      </doc>
      </add>
      

      Execute query:
      http://myhost:8983/solr/test_hl/select?q=%22a+b+c+d+e+f+g+h%22&wt=json&indent=true&hl=true&hl.fl=document_text&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

      This is the result:

      {
        "responseHeader":{
          "status":0,
          "QTime":2},
        "response":{"numFound":1,"start":0,"docs":[
            {
              "id":"1",
              "document_text":"a c b d a b c d e f g h i a f g b e"}]
        },
        "highlighting":{
          "1":{}}}
      

      Highlighting for document 1 is empty!
      Searching for "a b c d e f g" works correctly
      This problem does not affect solr 5.4

      Attachments

        Activity

          People

            Unassigned Unassigned
            zaccheob Zaccheo Bagnati
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: