[SOLR-14434] Add documentation for adding multiterm analyzers in Schema API - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Schema and Analysis
Labels:
None

Description

Originally this was filed as a bug report, but upon further inspection I realized the usage was just undocumented and just a result of inconsistent property name (casing) between the XML and JSON. Changing this to a Jira to add documentation so others don't run into this issue in the future.

Also need to document that the "analysis/field" API ignores multiterm analysis and thus doesn't reflect the full nature of incoming queries. This has been an annoying quirk for years and I think would be worth fixing, but for now we should at least document it.

--------------

In addition to "index" and "query" analyzers, Solr supports adding an explicit "multiterm" analyzer to schema fieldType definitions. This allows for specific control over analysis for things like wildcard terms, prefix queries, range queries, etc. For example, the following would cause the wildcard query for "hats*" to get stemmed to "hat*" instead of "hats*", and thus match on the indexed version of "hat".

  <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" positionIncrementGap="100" termOffsets="true" termVectors="true">
    <analyzer type="index">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="multiterm">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
  </fieldType>

In the xml version this analyzer is called "multiterm", whereas it's "multiTerm" in the JsonAPI. This isn't in the documentation anywhere and just cost me a bunch of time debugging through the code until I finally found what was going on. Using this ticket to add better documentation around usage and gotchas around this feature.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Trey Grainger

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 24/Apr/20 04:02

Updated:: 27/May/20 18:05