Lucene - Core
  1. Lucene - Core
  2. LUCENE-2230

Lucene Fuzzy Search: BK-Tree can improve performance 3-20 times.

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Environment:
    • Lucene Fields:
      New, Patch Available

      Description

      W. Burkhard and R. Keller. Some approaches to best-match file searching, CACM, 1973
      http://portal.acm.org/citation.cfm?doid=362003.362025

      I was inspired by http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees (Nick Johnson, Google).

      Additionally, simplified algorythm at http://www.catalysoft.com/articles/StrikeAMatch.html seems to be much more logically correct than Levenstein distance, and it is 3-5 times faster (isolated tests).

      Big list od distance implementations:
      http://www.dcs.shef.ac.uk/~sam/stringmetrics.htm

      1. FuzzyTermEnumNEW.java
        4 kB
        Fuad Efendi
      2. FuzzyTermEnumNEW.java
        4 kB
        Fuad Efendi
      3. DistanceImpl.java
        4 kB
        Fuad Efendi
      4. Distance.java
        0.1 kB
        Fuad Efendi
      5. BKTree.java
        3 kB
        Fuad Efendi

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            Fuad Efendi
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 1m
              1m
              Remaining:
              Remaining Estimate - 1m
              1m
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development