Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
7.7.2, 8.2
-
None
-
None
Description
SolrResourceLoader will attempt to do some magic to infer what the user wanted when loading TokenFilter and Tokenizer classes. However, this can end up putting the wrong class in the cache such that the request succeeds the first time but fails subsequent times. It should either succeed or fail consistently on every call.
This can be triggered in a variety of ways, but the simplest is maybe by specifying the wrong element type in an indexing chain. Consider the field type definition:
<fieldType name="text_en_partial" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.NGramTokenizerFactory" minGramSize="1" maxGramSize="2"/> </analyzer> </fieldType>
If loaded by itself (e.g. docker container for standalone validation) then the schema will pass and collection will succeed, with Solr actually figuring out that it needs an NGramTokenFilterFactory. However, if this is loaded on a cluster with other collections where the NGramTokenizerFactory has been loaded correctly then we get ClassCastException. Or if this collection is loaded first then others using the Tokenizer will fail instead.
I'd argue that succeeding on both calls is the better approach because it does what the user likely wants instead of what the user explicitly asks for, and creates a nicer user experience that is marginally less pedantic.