Lucene - Core
  1. Lucene - Core
  2. LUCENE-6833

Upgrade morfologik to version 2.0.1, simplify MorfologikFilter's dictionary lookup

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This is a follow-up to Uwe's work on LUCENE-6774.

      This patch updates the code to use Morfologik stemming version 2.0.1, which removes the "automatic" lookup of classpath-relative dictionary resources in favor of an explicit InputStream or URL. So the user code is explicitly responsible to provide these resources, reacting to missing files, etc.

      There were no other "default" dictionaries in Morfologik other than the Polish dictionary so I also cleaned up the filter code from a number of attributes that were, to me, confusing.

      • MorfologikFilterFactory now accepts an (optional) dictionary attribute which contains an explicit name of the dictionary resource to load. The resource is loaded with a ResourceLoader passed to the inform(..) method, so the final location depends on the resource loader.
      • There is no way to load the dictionary and metadata separately (this isn't at all useful).
      • If the dictionary attribute is missing, the filter loads the Polish dictionary by default (since most people would be using Morfologik for stemming Polish anyway).

      This patch is not backward compatible, but it attempts to provide useful feedback on initialization: if the removed attributes were used, it points at this JIRA issue, so it should be clear what to change and how.

      1. LUCENE-6833.patch
        30 kB
        Dawid Weiss
      2. LUCENE-6833.patch
        29 kB
        Dawid Weiss

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Hi Dawid,
          I was thinking about removing the classpath resource parameter from the factory in LUCENE-6775, but did not do that for backwards compatibility. But I think we can do that (simplifies the horrible if-then-else checking in the factory parameter validation).

          Show
          Uwe Schindler added a comment - Hi Dawid, I was thinking about removing the classpath resource parameter from the factory in LUCENE-6775 , but did not do that for backwards compatibility. But I think we can do that (simplifies the horrible if-then-else checking in the factory parameter validation).
          Hide
          Uwe Schindler added a comment - - edited

          As 5.4 was not yet released, you can revert LUCENE-6775 and reimplement it based on your suggestion You can still reuse my ResourceLoader stuff and some of the new tests.

          Show
          Uwe Schindler added a comment - - edited As 5.4 was not yet released, you can revert LUCENE-6775 and reimplement it based on your suggestion You can still reuse my ResourceLoader stuff and some of the new tests.
          Hide
          Dawid Weiss added a comment -

          Yeah, I don't think backwards compat is that important in this case, really – if somebody uses custom dictionaries they very well know what they're doing (they'd have to compile them first, which isn't that trivial in pre-2.0.0 morfologik...).

          I surely used your code, it was great help!

          Show
          Dawid Weiss added a comment - Yeah, I don't think backwards compat is that important in this case, really – if somebody uses custom dictionaries they very well know what they're doing (they'd have to compile them first, which isn't that trivial in pre-2.0.0 morfologik...). I surely used your code, it was great help!
          Hide
          Dawid Weiss added a comment -

          Oh, I left LUCENE-6775 without reverting, this patch just overlays some changes on top of that. I think that issue remains valid (adding support for custom dictionaries), it's just done in a slightly different way.

          Show
          Dawid Weiss added a comment - Oh, I left LUCENE-6775 without reverting, this patch just overlays some changes on top of that. I think that issue remains valid (adding support for custom dictionaries), it's just done in a slightly different way.
          Hide
          Uwe Schindler added a comment -

          OK, +1

          Small suggestion:

          +    if (args.isEmpty() ||
          +        (args.size() == 1 && args.containsKey(DICTIONARY_ATTRIBUTE))) {
          +      return;
               }
          

          I'd remove the old style attribute in the existence-check before and leave args.isEmpty() check here.

          Show
          Uwe Schindler added a comment - OK, +1 Small suggestion: + if (args.isEmpty() || + (args.size() == 1 && args.containsKey(DICTIONARY_ATTRIBUTE))) { + return; } I'd remove the old style attribute in the existence-check before and leave args.isEmpty() check here.
          Hide
          Dawid Weiss added a comment -

          I was actually thinking about that, but adding a field just to propagate a single value to the inform method seemed uglier to me than this?

          Show
          Dawid Weiss added a comment - I was actually thinking about that, but adding a field just to propagate a single value to the inform method seemed uglier to me than this?
          Hide
          Uwe Schindler added a comment -

          I was more talking about the removal from the map about the isEmpty() check, which is unrelated to this, because you refer to getOriginalArgs() in the inform.
          I still think you should add the field, that's how all the other factories do it.

          Show
          Uwe Schindler added a comment - I was more talking about the removal from the map about the isEmpty() check, which is unrelated to this, because you refer to getOriginalArgs() in the inform. I still think you should add the field, that's how all the other factories do it.
          Hide
          Dawid Weiss added a comment -

          Moved attribute reading to the constructor. Fixed classpath for tests (includes test-files now).

          Show
          Dawid Weiss added a comment - Moved attribute reading to the constructor. Fixed classpath for tests (includes test-files now).
          Hide
          ASF subversion and git services added a comment -

          Commit 1707457 from Dawid Weiss in branch 'dev/trunk'
          [ https://svn.apache.org/r1707457 ]

          LUCENE-6833: Upgraded Morfologik to version 2.0.1. The 'dictionary' attribute has been
          reverted back and now points at the dictionary resource to be loaded instead of the default Polish dictionary.

          Show
          ASF subversion and git services added a comment - Commit 1707457 from Dawid Weiss in branch 'dev/trunk' [ https://svn.apache.org/r1707457 ] LUCENE-6833 : Upgraded Morfologik to version 2.0.1. The 'dictionary' attribute has been reverted back and now points at the dictionary resource to be loaded instead of the default Polish dictionary.
          Hide
          ASF subversion and git services added a comment -

          Commit 1707458 from Dawid Weiss in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1707458 ]

          LUCENE-6833: Upgraded Morfologik to version 2.0.1. The 'dictionary' attribute has been
          reverted back and now points at the dictionary resource to be loaded instead of the default Polish dictionary.

          Show
          ASF subversion and git services added a comment - Commit 1707458 from Dawid Weiss in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1707458 ] LUCENE-6833 : Upgraded Morfologik to version 2.0.1. The 'dictionary' attribute has been reverted back and now points at the dictionary resource to be loaded instead of the default Polish dictionary.
          Hide
          Uwe Schindler added a comment -

          Thanks! All fine now.

          Show
          Uwe Schindler added a comment - Thanks! All fine now.
          Hide
          David Smiley added a comment -

          When I try to run the maven build, it fails because essentially morfologik doesn't have "test-files" on the classpath. I had to modify the morfologik pom (via our template) to include this, like so:

          <testResource>
                  <directory>${module-path}/src/test-files</directory>
           </testResource>
          

          (which is done similarly for some other modules, e.g. velocity).

          I'm confused why I'm discovering this now... so strange. Any ideas what's going on? Pending what we find, I plan to commit this fix.

          Show
          David Smiley added a comment - When I try to run the maven build, it fails because essentially morfologik doesn't have "test-files" on the classpath. I had to modify the morfologik pom (via our template) to include this, like so: <testResource> <directory> ${module-path}/src/test-files </directory> </testResource> (which is done similarly for some other modules, e.g. velocity). I'm confused why I'm discovering this now... so strange. Any ideas what's going on? Pending what we find, I plan to commit this fix.
          Hide
          Dawid Weiss added a comment -

          I don't know, I don't use the maven build. The test classpath is added manually in ant too and it doesn't follow the typical Maven convention so I guess you'll have to add it.

          Show
          Dawid Weiss added a comment - I don't know, I don't use the maven build. The test classpath is added manually in ant too and it doesn't follow the typical Maven convention so I guess you'll have to add it.
          Hide
          ASF subversion and git services added a comment -

          Commit 1717271 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1717271 ]

          LUCENE-6833: maven build: add test-files/ to morfologik pom

          Show
          ASF subversion and git services added a comment - Commit 1717271 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1717271 ] LUCENE-6833 : maven build: add test-files/ to morfologik pom
          Hide
          ASF subversion and git services added a comment -

          Commit 1717272 from David Smiley in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1717272 ]

          LUCENE-6833: maven build: add test-files/ to morfologik pom

          Show
          ASF subversion and git services added a comment - Commit 1717272 from David Smiley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1717272 ] LUCENE-6833 : maven build: add test-files/ to morfologik pom

            People

            • Assignee:
              Dawid Weiss
              Reporter:
              Dawid Weiss
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development