Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9410

German/French stemmers fail for common forms maux, gegrüßt, grüßend, schlummert

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 8.5
    • Fix Version/s: None
    • Component/s: modules/analysis
    • Environment:

      Elasticsearch 7.7.1 running on cloud.elastic.co

    • Lucene Fields:
      New

      Description

      I'm using Lucene via Elasticsearch 7.7.1. German and French stemmers (either via the Snowball analyzer, or the "light" or "heavy" stemming analyzers) are failing to understand some common forms:

      French:

      • "maux" (plural) should match "mal" (singular) but instead "maux" is unchanged

      German:

      • "schlummert" should match "schlummern" (infinitive) but instead is unchanged
      • "grüßend" should match "grüßen" (infinitive) but instead yields "grussend"
      • "gegrüßt" should match "grüßen" (infinitive) but instead yields "gegrusst"

      The Elasticsearch folks said I should file a bug with Lucene.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                bkazez Ben Kazez
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m