Description
This is a follow-up to Uwe's work on LUCENE-6774.
This patch updates the code to use Morfologik stemming version 2.0.1, which removes the "automatic" lookup of classpath-relative dictionary resources in favor of an explicit InputStream or URL. So the user code is explicitly responsible to provide these resources, reacting to missing files, etc.
There were no other "default" dictionaries in Morfologik other than the Polish dictionary so I also cleaned up the filter code from a number of attributes that were, to me, confusing.
- MorfologikFilterFactory now accepts an (optional) dictionary attribute which contains an explicit name of the dictionary resource to load. The resource is loaded with a ResourceLoader passed to the inform(..) method, so the final location depends on the resource loader.
- There is no way to load the dictionary and metadata separately (this isn't at all useful).
- If the dictionary attribute is missing, the filter loads the Polish dictionary by default (since most people would be using Morfologik for stemming Polish anyway).
This patch is not backward compatible, but it attempts to provide useful feedback on initialization: if the removed attributes were used, it points at this JIRA issue, so it should be clear what to change and how.
Attachments
Attachments
Issue Links
- contains
-
SOLR-7792 Upgrade morfologik-stemming to version 1.10.0
- Closed
-
LUCENE-6775 Improve MorfologikFilterFactory to allow arbitrary dictionaries from ResourceLoader
- Closed
- relates to
-
LUCENE-6774 Remove solr hack in MorfologikFilter
- Resolved
-
SOLR-7790 Update Carrot2 clustering contrib to version 3.10.4
- Closed