Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5818

Fix hunspell zero-string overgeneration

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.10, 6.0
    • None
    • None
    • New

    Description

      Currently, its allowed to strip suffixes/prefixes all the way down to the empty string. But this is not really allowed, and creates overgeneration in some cases (especially where endings can be standalone ... typically these are stopwords so it causes a lot of damage).

      Example is czech 'už' which should just stem to itself, but today also stems to 'úžit' because it has a flag compatible with that.

      Attachments

        1. LUCENE-5818.patch
          4 kB
          Robert Muir

        Activity

          People

            Unassigned Unassigned
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment