Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6833

Upgrade morfologik to version 2.0.1, simplify MorfologikFilter's dictionary lookup


    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 5.4, 6.0
    • None
    • None
    • New


      This is a follow-up to Uwe's work on LUCENE-6774.

      This patch updates the code to use Morfologik stemming version 2.0.1, which removes the "automatic" lookup of classpath-relative dictionary resources in favor of an explicit InputStream or URL. So the user code is explicitly responsible to provide these resources, reacting to missing files, etc.

      There were no other "default" dictionaries in Morfologik other than the Polish dictionary so I also cleaned up the filter code from a number of attributes that were, to me, confusing.

      • MorfologikFilterFactory now accepts an (optional) dictionary attribute which contains an explicit name of the dictionary resource to load. The resource is loaded with a ResourceLoader passed to the inform(..) method, so the final location depends on the resource loader.
      • There is no way to load the dictionary and metadata separately (this isn't at all useful).
      • If the dictionary attribute is missing, the filter loads the Polish dictionary by default (since most people would be using Morfologik for stemming Polish anyway).

      This patch is not backward compatible, but it attempts to provide useful feedback on initialization: if the removed attributes were used, it points at this JIRA issue, so it should be clear what to change and how.


        1. LUCENE-6833.patch
          30 kB
          Dawid Weiss
        2. LUCENE-6833.patch
          29 kB
          Dawid Weiss

        Issue Links



              dweiss Dawid Weiss
              dweiss Dawid Weiss
              0 Vote for this issue
              3 Start watching this issue