Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5484

Distinct control of recursion levels for prefix and suffix in Hunspell.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • None
    • None
    • modules/analysis
    • None
    • New

    Description

      Currently, there is an option to set recursionCap value to control depth of recursion in Hunspell token filter. This recursion enables to apply allowed affix rule to input token and pass output token(s) as an input tokens recursively.

      However, the recursionCap does not allow to distinguish between how many prefix and suffix rules were applied. It just counts for total. For example if recursionCap is set to 1 it actually includes all of the following options:

      • 2 prefix rules, 0 suffix rules
      • 1prefix rule, 1 suffix rule
      • 0 prefix rules, 2 suffix rules

      In some cases it is required to be able to distinguish between prefix rule and suffix rule and have finer control over how many times is each applied. Requested feature should allow setting recursion level separately for prefix and suffix rules.

      Specific example is the Czech dictionary, where it gives best results if suffix rules are applied only once. Hence recursionCap = 0. But if for input token a prefix rule is applied it does not allow to apply suffix rule and produces a token that is not in root form. And setting recursionCap = 1 produces too many irrelevant tokens that it makes Hunspell token filter unuseful. Good solution to this problem would be tell Hunspell token filter to apply up to 1 prefix rule and up to 1 suffix rule only (meaning never allow to apply 0 prefix rules and 2 suffix rules).

      Generally, this is probably dependant a lot on how particular dictionary and affix rules are constructed and it might not be considered a generalization but rather an expert feature.

      (There was some relevant discussion going on in LUCENE-5468)

      Attachments

        Activity

          People

            Unassigned Unassigned
            lukas.vlcek Lukas Vlcek
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: