Lucene - Core
  1. Lucene - Core
  2. LUCENE-2479

need the ability to also sort SpellCheck results by freq, instead of just by Edit Distance+freq

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: modules/spellchecker
    • Labels:
      None
    • Environment:

      all

    • Lucene Fields:
      New

      Description

      This issue was first noticed and reported in this Solr thread; http://lucene.472066.n3.nabble.com/spellcheck-issues-td489776.html#a489788

      Basically, there are situations where it would be useful to sort by freq first, instead of the current "sort by edit distance, and then subsort by freq if edit distance is equal"

      The author of the thread suggested "What I think would work even better than allowing a custom compareTo function would be to incorporate the frequency directly into the distance function. This would allow for greater control over the trade-off between frequency and edit distance"

      However, custom compareTo functions are not always be possible (ie if a certain version of Lucene must be used, because it was release with Solr) and incorporating freq directly into the distance function may be overkill (ie depending on the implementation)

      it is suggested that we have a simple modification of the existing compareTo function in Lucene to allow users to specify if they want the existing sort method or if they want to sort by freq.

      1. LUCENE-2479.patch
        16 kB
        Grant Ingersoll

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          Patch that implements the comparator approach. I didn't incorporate the freq into the scoring b/c this would mean having to look up the freq. for every suggestion, which I think would be pretty bad performance-wise.

          I also refactored the Solr SpellCheckComponent a little bit to not have a copy and paste of the SuggestWord* classes. I intend to commit today or tomorrow. All tests pass and it is back compatible. I will also port back to 3.x

          Show
          Grant Ingersoll added a comment - Patch that implements the comparator approach. I didn't incorporate the freq into the scoring b/c this would mean having to look up the freq. for every suggestion, which I think would be pretty bad performance-wise. I also refactored the Solr SpellCheckComponent a little bit to not have a copy and paste of the SuggestWord* classes. I intend to commit today or tomorrow. All tests pass and it is back compatible. I will also port back to 3.x
          Hide
          Grant Ingersoll added a comment -

          Committed revision 986477 (trunk). One minor variation from the patch in that I added a bit more testing.

          Committed revision 986495.(3.x).

          Show
          Grant Ingersoll added a comment - Committed revision 986477 (trunk). One minor variation from the patch in that I added a bit more testing. Committed revision 986495.(3.x).
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Gerald DeConto
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development