Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.2
    • Fix Version/s: 5.3, 6.0
    • Component/s: core/search
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • Bugzilla Id:
      32942

      Description

      Queries which automatically produce multiple terms (wildcard, range, prefix,
      fuzzy etc)currently suffer from two problems:

      1) Scores for matching documents are significantly smaller than term queries
      because of the volume of terms introduced (A match on query Foo~ is 0.1
      whereas a match on query Foo is 1).
      2) The rarer forms of expanded terms are favoured over those of more common
      forms because of the IDF. When using Fuzzy queries for example, rare mis-
      spellings typically appear in results before the more common correct spellings.

      I will attach a patch that corrects the issues identified above by
      1) Overriding Similarity.coord to counteract the downplaying of scores
      introduced by expanding terms.
      2) Taking the IDF factor of the most common form of expanded terms as the
      basis of scoring all other expanded terms.

        Attachments

        1. ASF.LICENSE.NOT.GRANTED--patch.txt
          11 kB
          Mark Harwood
        2. LUCENE-329.patch
          13 kB
          Mark Harwood
        3. LUCENE-329.patch
          13 kB
          Mark Harwood
        4. LUCENE-329.patch
          12 kB
          Mark Harwood
        5. LUCENE-329.patch
          11 kB
          Mark Harwood

          Issue Links

            Activity

              People

              • Assignee:
                markh Mark Harwood
                Reporter:
                markharw00d@yahoo.co.uk Mark Harwood
              • Votes:
                3 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: