Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-23

GermanStemFilter setting wrong values for startoffset/endoffset of stemmed tokens

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • None
    • modules/analysis
    • None
    • Operating System: Linux
      Platform: PC

    • 7412

    Description

      The GermanStemFilter sets wrong values to the new Token object created when the
      stemmer succeeds in stemming the termText() string. Bug found in 1.2-RC5-dev

      -----------------
      Example, for the processing of the string "this is a simple test":
      token : thi (0,3)
      token : is (5,7)
      token : a (8,9)
      token : simpl (0,5)
      token : test (17,21)

      (all the stemmed tokens have wrong start/end offsets).

      Attachments

        Activity

          People

            java-dev@lucene.apache.org Lucene Developers
            reyes@charabia.net Rodrigo Reyes
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: