Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2230

Lucene Fuzzy Search: BK-Tree can improve performance 3-20 times.

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Environment:
    • Lucene Fields:
      New, Patch Available

      Description

      W. Burkhard and R. Keller. Some approaches to best-match file searching, CACM, 1973
      http://portal.acm.org/citation.cfm?doid=362003.362025

      I was inspired by http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees (Nick Johnson, Google).

      Additionally, simplified algorythm at http://www.catalysoft.com/articles/StrikeAMatch.html seems to be much more logically correct than Levenstein distance, and it is 3-5 times faster (isolated tests).

      Big list od distance implementations:
      http://www.dcs.shef.ac.uk/~sam/stringmetrics.htm

        Attachments

        1. FuzzyTermEnumNEW.java
          4 kB
          Fuad Efendi
        2. FuzzyTermEnumNEW.java
          4 kB
          Fuad Efendi
        3. DistanceImpl.java
          4 kB
          Fuad Efendi
        4. Distance.java
          0.1 kB
          Fuad Efendi
        5. BKTree.java
          3 kB
          Fuad Efendi

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              funtick Fuad Efendi
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 1m
                1m
                Remaining:
                Remaining Estimate - 1m
                1m
                Logged:
                Time Spent - Not Specified
                Not Specified