Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-3235

Whitespace issue in synonym list

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 3.4, 3.5
    • 3.4
    • Schema and Analysis
    • Windows 7
      Firefox 10.0.2
      Solr example (start.jar)

    Description

      If you use the following schema.xml entrie:

      <fieldType name="contenttype" class="solr.TextField" multiValued="true" omitNorms="true">
      <analyzer type="index">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
      </analyzer>
      <analyzer type="query">
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      </analyzer>
      </fieldType>

      With a synonym list having such entrie:

      text/html;\ charset=ISO-8859-1 => html

      Solr 3.4 and 3.5 can't handle the whitespace between "html;" and "charset" and no synonym substitution is processed. The same config works find in Solr 3.3.
      No exception or error is thrown.

      This is my first jira ticket, so if I mist something let me know...

      Regrads

      Johannes

      Edit: Ok found the solution for that problem. Provide the following:

      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"
      tokenizerFactory="solr.KeywordTokenizerFactory" />

      As tokenizerFactory you should use "solr.KeywordTokenizerFactory" instead of "solr.WhitespaceTokenizerFactory".

      See the javadocs for more details:
      https://builds.apache.org/job/Solr-trunk/javadoc/org/apache/solr/analysis/SynonymFilterFactory.html

      Attachments

        Activity

          People

            Unassigned Unassigned
            jb@shi Johannes Brucher
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: