Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4063

FrenchLightStemmer performs abusive compression of (arbitrary) repeated characters in long tokens

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 3.4, 4.0-ALPHA
    • 4.0-ALPHA
    • modules/analysis
    • None
    • New, Patch Available

    Description

      FrenchLightStemmer performs aggressive deletions on repeated character sequences, even on numbers.
      This might be unexpected during full text search.

      Attachments

        1. SOLR-3463.patch
          2 kB
          Tanguy Moal
        2. SOLR-3463.patch
          2 kB
          Tanguy Moal
        3. SOLR-3463.patch
          2 kB
          Tanguy Moal
        4. LUCENE-4063.patch
          3 kB
          Steven Rowe

        Activity

          People

            sarowe Steven Rowe
            tanguy Tanguy Moal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: