Lucene - Core
  1. Lucene - Core
  2. LUCENE-2230

Lucene Fuzzy Search: BK-Tree can improve performance 3-20 times.

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0
    • Fix Version/s: None
    • Component/s: core/search
    • Labels:
      None
    • Environment:
    • Lucene Fields:
      New, Patch Available

      Description

      W. Burkhard and R. Keller. Some approaches to best-match file searching, CACM, 1973
      http://portal.acm.org/citation.cfm?doid=362003.362025

      I was inspired by http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees (Nick Johnson, Google).

      Additionally, simplified algorythm at http://www.catalysoft.com/articles/StrikeAMatch.html seems to be much more logically correct than Levenstein distance, and it is 3-5 times faster (isolated tests).

      Big list od distance implementations:
      http://www.dcs.shef.ac.uk/~sam/stringmetrics.htm

      1. BKTree.java
        3 kB
        Fuad Efendi
      2. Distance.java
        0.1 kB
        Fuad Efendi
      3. DistanceImpl.java
        4 kB
        Fuad Efendi
      4. FuzzyTermEnumNEW.java
        4 kB
        Fuad Efendi
      5. FuzzyTermEnumNEW.java
        4 kB
        Fuad Efendi

        Activity

        Fuad Efendi created issue -
        Fuad Efendi made changes -
        Field Original Value New Value
        Attachment BKTree.java [ 12431059 ]
        Attachment Distance.java [ 12431060 ]
        Attachment DistanceImpl.java [ 12431061 ]
        Fuad Efendi made changes -
        Attachment FuzzyTermEnumNEW.java [ 12431063 ]
        Fuad Efendi made changes -
        Attachment FuzzyTermEnumNEW.java [ 12431186 ]
        Robert Muir made changes -
        Component/s Search [ 12310235 ]
        Mark Thomas made changes -
        Workflow jira [ 12488966 ] Default workflow, editable Closed status [ 12563097 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12563097 ] jira [ 12584138 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Fuad Efendi
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 1m
              1m
              Remaining:
              Remaining Estimate - 1m
              1m
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development