Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5826

Support proper hunspell case handling and related options

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When ignoreCase=false, we should accept title-cased/upper-cased forms just like hunspell -m. Furthermore there are some options around this:

      • LANG: can turn on alternate casing for turkish/azeri
      • KEEPCASE: can prevent acceptance of title/upper cased forms for words

      While we are here setting up the same logic anyway, add support for similar options:

      • NEEDAFFIX/PSEUDOROOT: form is invalid without being affixed
      • ONLYINCOMPOUND: form/affixes only make sense inside compounds.

      This stuff is unrelated to the ignoreCase=true option. If you use that option though, it does use correct alternate casing for tr_TR/az_AZ now though.

      I didn't yet implement CHECKSHARPS because it seems more complicated, I have to figure out what the logic there should be first.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: