Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13861

SynonymGraphFilterFactory - with pattern tokenizer - not able to start

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 7.7.2
    • None
    • search

    Description

      Hi,

      we face problem with definition of SynonymGraphFilterFactory, when we use SimplePatternTokenizerFactory. It seem's that there is a problem, that Solr during processing schema, lose attribute tokenizerFactory.pattern.

       

      <fieldType name="text_synonym" class="solr.TextField"  >
      		<analyzer type="index">
      			<tokenizer class="solr.SimplePatternTokenizerFactory" pattern="[^,]+"/>
      		</analyzer>
      		<analyzer type="query">
      			<tokenizer class="solr.SimplePatternTokenizerFactory" pattern="[^,]+"/>
      			<filter class="solr.SynonymGraphFilterFactory"
      					synonyms="synonyms.txt"
      					expand="false"
                  tokenizerFactory="solr.SimplePatternTokenizerFactory" tokenizerFactory.pattern="[^,]+" />
      		</analyzer>
      	</fieldType>
      

      We got exception like this:

      Caused by: java.lang.IllegalArgumentException: Configuration Error: missing parameter 'pattern'
              at org.apache.lucene.analysis.util.AbstractAnalysisFactory.require(AbstractAnalysisFactory.java:97)
              at org.apache.lucene.analysis.pattern.SimplePatternTokenizerFactory.<init>(SimplePatternTokenizerFactory.java:68)
              ... 58 more
      

      We debug this issue and we found that problem is at this method which are called more than once:

      // (there are no tests for this functionality)
        private TokenizerFactory loadTokenizerFactory(ResourceLoader loader, String cname) throws IOException {
          Class<? extends TokenizerFactory> clazz = loader.findClass(cname, TokenizerFactory.class);
          try {
            TokenizerFactory tokFactory = clazz.getConstructor(Map.class).newInstance(tokArgs);
            if (tokFactory instanceof ResourceLoaderAware) {
              ((ResourceLoaderAware) tokFactory).inform(loader);
            }
            return tokFactory;
          } catch (Exception e) {
            throw new RuntimeException(e);
          }
        }
      

      In a first step argument tokArgs was cleared. And in second step, Solr reports missing param pattern.

      We did some workaround like this:

      TokenizerFactory tokFactory = clazz.getConstructor(Map.class).newInstance(new HashMap<>(tokArgs))
      

      , which creates for each call new map from tokArgs, which could be cleared. But I think, that for this issue will exist better solution, then creating copy of tokArgs map.

      After that we can run filter, mentioned above, without problems.

      Attachments

        Activity

          People

            Unassigned Unassigned
            profimedia Profimedia
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: