Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently, its allowed to strip suffixes/prefixes all the way down to the empty string. But this is not really allowed, and creates overgeneration in some cases (especially where endings can be standalone ... typically these are stopwords so it causes a lot of damage).

      Example is czech 'už' which should just stem to itself, but today also stems to 'úžit' because it has a flag compatible with that.

        Activity

        Hide
        Robert Muir added a comment -

        Simple patch with some tests. This might be a bug i introduced when cutting over to FST, because we had no test for it before.

        Show
        Robert Muir added a comment - Simple patch with some tests. This might be a bug i introduced when cutting over to FST, because we had no test for it before.
        Hide
        Hoss Man added a comment -
        Show
        Hoss Man added a comment - For those keeping score at home... http://svn.apache.org/r1609738 http://svn.apache.org/r1609739

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development