Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: None
    • Component/s: general/javadocs
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Added some javadocs that explains why the spellchecker does not work as one might expect it to.

      http://www.nabble.com/SpellChecker%3A%3AsuggestSimilar%28%29-Question-tf3118660.html#a8640395

      > Without having looked at the code for a long time, I think the problem is what the
      > lucene scoring consider to be best. First the grams are searched, resulting in a number
      > of hits. Then the edit-distance is calculated on each hit. "Genetics" is appearently the
      > third most similar hit according to Lucene, but the best according to Levenshtein.
      >
      > I.e. Lucene does not use edit-distance as similarity. You need to get a bunch of best hits
      > in order to find the one with the smallest edit-distance.

      I took a look at the code, and my assessment seems to be right.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        35d 3h 33m 1 Otis Gospodnetic 02/Mar/07 18:29
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12562236 ] jira [ 12583244 ]
        Mark Thomas made changes -
        Workflow jira [ 12395113 ] Default workflow, editable Closed status [ 12562236 ]
        Hide
        Karl Wettin added a comment -

        It might be noteworthy that the spell checker will gather numSug * 10 hits from the a priori corpus. I suppose that number (10) was something the original author came up with when testing. In most cases it is seems to be good enough. In my refactor I've introduced a method parameter for the factor. This is probably a better looking solution than telling the user to increase numSug, as numSug saves a few clock ticks when not adding a suggestion to the priority list.

        The javadocs should probaly state something like that instead.

        Show
        Karl Wettin added a comment - It might be noteworthy that the spell checker will gather numSug * 10 hits from the a priori corpus. I suppose that number (10) was something the original author came up with when testing. In most cases it is seems to be good enough. In my refactor I've introduced a method parameter for the factor. This is probably a better looking solution than telling the user to increase numSug, as numSug saves a few clock ticks when not adding a suggestion to the priority list. The javadocs should probaly state something like that instead.
        Otis Gospodnetic made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Lucene Fields [Patch Available, New] [New, Patch Available]
        Resolution Fixed [ 1 ]
        Hide
        Otis Gospodnetic added a comment -

        Applied, merci Karl.

        Show
        Otis Gospodnetic added a comment - Applied, merci Karl.
        Otis Gospodnetic made changes -
        Assignee Otis Gospodnetic [ otis ]
        Karl Wettin made changes -
        Field Original Value New Value
        Attachment spellcheck_javadocs.diff [ 12349692 ]
        Hide
        Karl Wettin added a comment -

        patch root is trunk/contrib/spellcheck

        Show
        Karl Wettin added a comment - patch root is trunk/contrib/spellcheck
        Karl Wettin created issue -

          People

          • Assignee:
            Otis Gospodnetic
            Reporter:
            Karl Wettin
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development