Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3038

DictionaryCompoundWordTokenFilter fails to create some tokens for final parts of words

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 3.1, 4.0-ALPHA
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      DictionaryCompoundWordTokenFilter: Due to an off-by-one error, a word component placed last in a compound word, will not get a token if its length is equal to the minimal sub-word length.

      Example:
      min sub-word length: 4
      Dictionary:

      {"alfa", "beta"}

      word: "alfabeta"
      Created tokens:

      {"alfabeta", "alfa"}

      Expected tokens:

      {"alfabeta", "alfa", "beta"}

      I have a patch with a testcase that fails on versions 3.1 and 4.0 (probably for everything between as well, and for previous versions), along with a bugfix.

        Attachments

        1. LUCENE-3038.patch
          2 kB
          Filip Svendsen

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              filipncs Filip Svendsen
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: