Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7192

Poor performance of Hunspell with Czech Dictionary

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.0
    • Fix Version/s: None
    • Component/s: Schema and Analysis
    • Labels:
    • Environment:

      Linux vld091 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux

      Description

      Possibly related to issue 3245 (https://issues.apache.org/jira/browse/SOLR-3245). Symptoms are exactly the same.

      HunspellStemFilterFactory with Czech dictionary is 100s times slower than CzechStemFilterFactory.

      Analyzer setup:

      <fieldtype name="text_cs" class="solr.TextField">
      <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory" />
      <filter class="solr.LowerCaseFilterFactory" />
      <filter class="solr.WordDelimiterFilterFactory"
      generateWordParts="1"
      generateNumberParts="1"
      catenateWords="0"
      catenateNumbers="0"
      catenateAll="0"
      stemEnglishPossessive="0" />
      <filter class="solr.HunspellStemFilterFactory"
      dictionary="cs_CZ.dic"
      affix="cs_CZ.aff"
      ignoreCase="true"
      strictAffixParsing="true" />
      <filter class="solr.ASCIIFoldingFilterFactory" />
      <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
      </analyzer>

      <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory" />
      <filter class="solr.LowerCaseFilterFactory" />
      <filter class="solr.WordDelimiterFilterFactory"
      generateWordParts="1"
      generateNumberParts="1"
      catenateWords="1"
      catenateNumbers="1"
      catenateAll="0"
      stemEnglishPossessive="0" />
      <filter class="solr.HunspellStemFilterFactory"
      dictionary="cs_CZ.dic"
      affix="cs_CZ.aff"
      ignoreCase="true"
      strictAffixParsing="true" />
      <filter class="solr.ASCIIFoldingFilterFactory" />
      <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
      </analyzer>
      </fieldtype>

        Attachments

        1. cz_CZ.zip
          533 kB
          Michal Danilak

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Mimino Michal Danilak
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: