Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3038

DictionaryCompoundWordTokenFilter fails to create some tokens for final parts of words

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 3.1, 4.0-ALPHA
    • 3.1, 4.0-ALPHA
    • modules/analysis
    • None
    • New, Patch Available

    Description

      DictionaryCompoundWordTokenFilter: Due to an off-by-one error, a word component placed last in a compound word, will not get a token if its length is equal to the minimal sub-word length.

      Example:
      min sub-word length: 4
      Dictionary:

      {"alfa", "beta"}

      word: "alfabeta"
      Created tokens:

      {"alfabeta", "alfa"}

      Expected tokens:

      {"alfabeta", "alfa", "beta"}

      I have a patch with a testcase that fails on versions 3.1 and 4.0 (probably for everything between as well, and for previous versions), along with a bugfix.

      Attachments

        1. LUCENE-3038.patch
          2 kB
          Filip Svendsen

        Activity

          People

            Unassigned Unassigned
            filipncs Filip Svendsen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: