Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14434

Add documentation for adding multiterm analyzers in Schema API

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Schema and Analysis
    • None

    Description

      Originally this was filed as a bug report, but upon further inspection I realized the usage was just undocumented and just a result of inconsistent property name (casing) between the XML and JSON. Changing this to a Jira to add documentation so others don't run into this issue in the future.

      Also need to document that the "analysis/field" API ignores multiterm analysis and thus doesn't reflect the full nature of incoming queries. This has been an annoying quirk for years and I think would be worth fixing, but for now we should at least document it.

      --------------

      In addition to "index" and "query" analyzers, Solr supports adding an explicit "multiterm" analyzer to schema fieldType definitions. This allows for specific control over analysis for things like wildcard terms, prefix queries, range queries, etc. For example, the following would cause the wildcard query for "hats*" to get stemmed to "hat*" instead of "hats*", and thus match on the indexed version of "hat".

        <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" positionIncrementGap="100" termOffsets="true" termVectors="true">
          <analyzer type="index">
            <tokenizer class="solr.ClassicTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishMinimalStemFilterFactory"/>
          </analyzer>
          <analyzer type="query">
            <tokenizer class="solr.ClassicTokenizerFactory"/>
            <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishMinimalStemFilterFactory"/>
          </analyzer>
          <analyzer type="multiterm">
            <tokenizer class="solr.ClassicTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishMinimalStemFilterFactory"/>
          </analyzer>
        </fieldType>

      In the xml version this analyzer is called "multiterm", whereas it's "multiTerm" in the JsonAPI. This isn't in the documentation anywhere and just cost me a bunch of time debugging through the code until I finally found what was going on. Using this ticket to add better documentation around usage and gotchas around this feature.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            solrtrey Trey Grainger
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: