Lucene - Core
  1. Lucene - Core
  2. LUCENE-1190

a lexicon object for merging spellchecker and synonyms from stemming

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 2.3
    • Fix Version/s: None
    • Component/s: core/search, modules/other
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Some Lucene features need a list of referring word. Spellchecking is the basic example, but synonyms is an other use. Other tools can be used smoothlier with a list of words, without disturbing the main index : stemming and other simplification of word (anagram, phonetic ...).
      For that, I suggest a Lexicon object, wich contains words (Term + frequency), wich can be built from Lucene Directory, or plain text files.
      Classical TokenFilter can be used with Lexicon (LowerCaseFilter and ISOLatin1AccentFilter should be the most useful).
      Lexicon uses a Lucene Directory, each Word is a Document, each meta is a Field (word, ngram, phonetic, fields, anagram, size ...).
      Above a minimum size, number of differents words used in an index can be considered as stable. So, a standard Lexicon (built from wikipedia by example) can be used.
      A similarTokenFilter is provided.
      A spellchecker will come soon.
      A fuzzySearch implementation, a neutral synonym TokenFilter can be done.
      Unused words can be remove on demand (lazy delete?)

      Any criticism or suggestions?

      1. aphone+lexicon.patch
        303 kB
        Mathieu Lecarme
      2. aphone+lexicon.patch
        336 kB
        Mathieu Lecarme

        Activity

        Mathieu Lecarme created issue -
        Mathieu Lecarme made changes -
        Field Original Value New Value
        Attachment aphone+lexicon.patch [ 12376437 ]
        Mathieu Lecarme made changes -
        Attachment aphone+lexicon.patch [ 12376860 ]
        Otis Gospodnetic made changes -
        Assignee Otis Gospodnetic [ otis ]
        Mark Thomas made changes -
        Workflow jira [ 12424423 ] Default workflow, editable Closed status [ 12563468 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12563468 ] jira [ 12585023 ]
        Erick Erickson made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Won't Fix [ 2 ]

          People

          • Assignee:
            Otis Gospodnetic
            Reporter:
            Mathieu Lecarme
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development