Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5818

Fix hunspell zero-string overgeneration

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently, its allowed to strip suffixes/prefixes all the way down to the empty string. But this is not really allowed, and creates overgeneration in some cases (especially where endings can be standalone ... typically these are stopwords so it causes a lot of damage).

      Example is czech 'už' which should just stem to itself, but today also stems to 'úžit' because it has a flag compatible with that.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: