Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-786

Extended javadocs in spellchecker

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 2.0.0
    • None
    • general/javadocs
    • None
    • New, Patch Available

    Description

      Added some javadocs that explains why the spellchecker does not work as one might expect it to.

      http://www.nabble.com/SpellChecker%3A%3AsuggestSimilar%28%29-Question-tf3118660.html#a8640395

      > Without having looked at the code for a long time, I think the problem is what the
      > lucene scoring consider to be best. First the grams are searched, resulting in a number
      > of hits. Then the edit-distance is calculated on each hit. "Genetics" is appearently the
      > third most similar hit according to Lucene, but the best according to Levenshtein.
      >
      > I.e. Lucene does not use edit-distance as similarity. You need to get a bunch of best hits
      > in order to find the one with the smallest edit-distance.

      I took a look at the code, and my assessment seems to be right.

      Attachments

        1. spellcheck_javadocs.diff
          2 kB
          Karl Wettin

        Activity

          People

            otis Otis Gospodnetic
            karl.wettin Karl Wettin
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: