Solr
  1. Solr
  2. SOLR-4813

Unavoidable IllegalArgumentException occurs when SynonymFilterFactory's setting has tokenizer factory's parameter.

    Details

      Description

      When I write SynonymFilterFactory' setting in schema.xml as follows, ...

      <analyzer>
        <tokenizer class="solr.NGramTokenizerFactory" maxGramSize="2" minGramSize="2"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"
         tokenizerFactory="solr.NGramTokenizerFactory" maxGramSize="2" minGramSize="2"/>
      </analyzer>
      

      IllegalArgumentException ("Unknown parameters") occurs.

      Caused by: java.lang.IllegalArgumentException: Unknown parameters: {maxGramSize=2, minGramSize=2}
      	at org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.<init>(FSTSynonymFilterFactory.java:71)
      	at org.apache.lucene.analysis.synonym.SynonymFilterFactory.<init>(SynonymFilterFactory.java:50)
      	... 28 more
      

      However TokenizerFactory's params should be set to loadTokenizerFactory method in [FST|Slow]SynonymFilterFactory. (ref. SOLR-2909)

      I think, the problem was caused by LUCENE-4877 ("Fix analyzer factories to throw exception when arguments are invalid") and SOLR-3402 ("Parse Version outside of Analysis Factories").

      1. SOLR-4813__4x.patch
        27 kB
        Hoss Man
      2. SOLR-4813.patch
        7 kB
        Hoss Man
      3. SOLR-4813.patch
        6 kB
        Shingo Sasaki

        Issue Links

          Activity

          Hide
          Jack Krupansky added a comment -

          The problem is not that you have a tokenizerFactory attribute, but that you are trying to use a tokenizer that has attributes. Solr is simply complaining that you have two attributes, maxGramSize and minGramSize that are not defined for the SynonymFilterFactory. Yes, that is a new feature in Solr! If your code was working in a previous release, you were lucky - it would have been using the default min and max of 1 and 1.

          The synonym tokenizerFactory attribute has no provision for passing attributes to the synonym tokenizer. For now, you'll have to create a custom ngram tokenizer factor with the desired settings.

          Show
          Jack Krupansky added a comment - The problem is not that you have a tokenizerFactory attribute, but that you are trying to use a tokenizer that has attributes. Solr is simply complaining that you have two attributes, maxGramSize and minGramSize that are not defined for the SynonymFilterFactory. Yes, that is a new feature in Solr! If your code was working in a previous release, you were lucky - it would have been using the default min and max of 1 and 1. The synonym tokenizerFactory attribute has no provision for passing attributes to the synonym tokenizer. For now, you'll have to create a custom ngram tokenizer factor with the desired settings.
          Hide
          Shingo Sasaki added a comment -

          Hi, Jack. Thank you for your comment.
          Do you mean that backward compatibility was lost?

          On samples of SOLR-319, the filter tag of SynonymFilterFactory has attributes, minGramSize and maxGramSize,
          and TokenizerFactory's instance is inited by args in FSTSynonymFilterFactory.loadTokenizerFactory of Lucene 4.2.1's.

          Show
          Shingo Sasaki added a comment - Hi, Jack. Thank you for your comment. Do you mean that backward compatibility was lost? On samples of SOLR-319 , the filter tag of SynonymFilterFactory has attributes, minGramSize and maxGramSize, and TokenizerFactory's instance is inited by args in FSTSynonymFilterFactory.loadTokenizerFactory of Lucene 4.2.1's.
          Hide
          Shingo Sasaki added a comment -

          I posted the patch of bug fix and improvement. Please review it.

          The improvement is following.

          If TokenizerFactory's parameter name is the same as SynonymFilterFactory's parameter name, you must add prefix "tokenizerFactory." to TokenizerFactory's parameter name.

          Show
          Shingo Sasaki added a comment - I posted the patch of bug fix and improvement. Please review it. The improvement is following. If TokenizerFactory's parameter name is the same as SynonymFilterFactory's parameter name, you must add prefix "tokenizerFactory." to TokenizerFactory's parameter name.
          Hide
          Jack Krupansky added a comment -

          Yes, you are right, this is a regression - I checked 4.2 - all the original "args" get passed to the tokenizer factory. And your suggested improvement makes sense.

          You should lobby to have this fix in 4.3.1.

          Some enhancement to the synonym filter factory Javadoc is also needed - to explain how tokenizer factory args are passed.

          Show
          Jack Krupansky added a comment - Yes, you are right, this is a regression - I checked 4.2 - all the original "args" get passed to the tokenizer factory. And your suggested improvement makes sense. You should lobby to have this fix in 4.3.1. Some enhancement to the synonym filter factory Javadoc is also needed - to explain how tokenizer factory args are passed.
          Hide
          Hoss Man added a comment -

          Thanks for the bug report, and especially for the patch with test case!

          your patch looks correct to me at first glance, but i'll review more closely and try to get in for 4.3.1

          Show
          Hoss Man added a comment - Thanks for the bug report, and especially for the patch with test case! your patch looks correct to me at first glance, but i'll review more closely and try to get in for 4.3.1
          Hide
          Hoss Man added a comment -

          Shingo: a flaw in your test was that "inform" was never called on the SynonymFilterFactory you were manually constructing, so it never attempted to instantiate your "TestTokenizerFactory", so all of the checks you had in it's constructor were being ignored.

          I simplified the test to use tokenFilterFactory() to try and deal with this, but that exposed another problem: since your TestTokenizerFactory class isn't registered with SPI, "lookupClass()" fails on it.

          To simplify all of this, I changed the test to us a real tokenizer factory with mandatory init arg (PatternTokenizerFactory), so we can check both the positive and negative cases – this includes an explicit check that specifying params which neither the SynonymFilterFactor nor the tokenizerFactory are expecting causes an error.

          I also tweaked your javadocs to try and clarify that the param prefix could be used even if the param names don't conflict, and re-ordered the param parsing code to group of of the tokenizerFactory stuff together.

          still running full tests.

          Show
          Hoss Man added a comment - Shingo: a flaw in your test was that "inform" was never called on the SynonymFilterFactory you were manually constructing, so it never attempted to instantiate your "TestTokenizerFactory", so all of the checks you had in it's constructor were being ignored. I simplified the test to use tokenFilterFactory() to try and deal with this, but that exposed another problem: since your TestTokenizerFactory class isn't registered with SPI, "lookupClass()" fails on it. To simplify all of this, I changed the test to us a real tokenizer factory with mandatory init arg (PatternTokenizerFactory), so we can check both the positive and negative cases – this includes an explicit check that specifying params which neither the SynonymFilterFactor nor the tokenizerFactory are expecting causes an error. I also tweaked your javadocs to try and clarify that the param prefix could be used even if the param names don't conflict, and re-ordered the param parsing code to group of of the tokenizerFactory stuff together. still running full tests.
          Hide
          Shingo Sasaki added a comment -

          Jack: Thanks for your check and advice.

          Hoss: Thank you for compensating for the shortcomings of my patch.

          By the way, the patch is applied to SynonymFilterFactory of trunk, but SynonymFilterFactory of branch_4x is delegatee of [FST|Slow]SynonymFilterFactory.

          Is branch_4x's patch required for committing to 4.3.1?

          Show
          Shingo Sasaki added a comment - Jack: Thanks for your check and advice. Hoss: Thank you for compensating for the shortcomings of my patch. By the way, the patch is applied to SynonymFilterFactory of trunk, but SynonymFilterFactory of branch_4x is delegatee of [FST|Slow] SynonymFilterFactory. Is branch_4x's patch required for committing to 4.3.1?
          Hide
          Hoss Man added a comment -

          Hoss: Thank you for compensating for the shortcomings of my patch.

          Hey man, you did the hard part – i just gave it a second pair of eyes and noticed the test glitch.

          Is branch_4x's patch required for committing to 4.3.1?

          Nope, i'll take care of the backport – but thanks forthe reminder about FST vs Slow on 4x, i might have missed one.

          Committed revision 1482474. - trunk ... backporting now.

          Show
          Hoss Man added a comment - Hoss: Thank you for compensating for the shortcomings of my patch. Hey man, you did the hard part – i just gave it a second pair of eyes and noticed the test glitch. Is branch_4x's patch required for committing to 4.3.1? Nope, i'll take care of the backport – but thanks forthe reminder about FST vs Slow on 4x, i might have missed one. Committed revision 1482474. - trunk ... backporting now.
          Hide
          Robert Muir added a comment -

          Thanks to both of you especially for still preserving the checking!!!

          Show
          Robert Muir added a comment - Thanks to both of you especially for still preserving the checking!!!
          Hide
          Hoss Man added a comment -

          patch showing what the backport looks like.

          it's unfortunate there isn't more code reuse between Slow & FST SynonymFactories, but that that can be refactored another day.

          the patch includes some additional test permutations to ensure the param checking works right regardless of which delegate is picked.

          i'm still running full tests & precommit

          Show
          Hoss Man added a comment - patch showing what the backport looks like. it's unfortunate there isn't more code reuse between Slow & FST SynonymFactories, but that that can be refactored another day. the patch includes some additional test permutations to ensure the param checking works right regardless of which delegate is picked. i'm still running full tests & precommit
          Hide
          Robert Muir added a comment -

          backport looks great!

          Show
          Robert Muir added a comment - backport looks great!
          Hide
          Hoss Man added a comment -

          Thanks rmuir,

          Committed revision 1482527 - 4x

          Backport to 4.3.1 was clean, i'll commit there when the tests are done

          Show
          Hoss Man added a comment - Thanks rmuir, Committed revision 1482527 - 4x Backport to 4.3.1 was clean, i'll commit there when the tests are done
          Hide
          Hoss Man added a comment -

          the 4.3.1 backport wasn't as clean as i originally thought, apparently i was depending on some test improvements that were committed to 4x as part of an improvement (LUCENE-3907) so i also merged back just the BaseTokenStreamFactoryTestCase changes from r1479892...

          Committed revision 1482619. - branch 4_3

          Show
          Hoss Man added a comment - the 4.3.1 backport wasn't as clean as i originally thought, apparently i was depending on some test improvements that were committed to 4x as part of an improvement ( LUCENE-3907 ) so i also merged back just the BaseTokenStreamFactoryTestCase changes from r1479892... Committed revision 1482619. - branch 4_3
          Hide
          Christian Moen added a comment -

          Good work. Thanks!

          Show
          Christian Moen added a comment - Good work. Thanks!
          Hide
          Shalin Shekhar Mangar added a comment -

          Bulk close after 4.3.1 release

          Show
          Shalin Shekhar Mangar added a comment - Bulk close after 4.3.1 release

            People

            • Assignee:
              Hoss Man
              Reporter:
              Shingo Sasaki
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development