Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2230

Lucene Fuzzy Search: BK-Tree can improve performance 3-20 times.

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0
    • None
    • core/search
    • None
    • New, Patch Available

    Description

      W. Burkhard and R. Keller. Some approaches to best-match file searching, CACM, 1973
      http://portal.acm.org/citation.cfm?doid=362003.362025

      I was inspired by http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees (Nick Johnson, Google).

      Additionally, simplified algorythm at http://www.catalysoft.com/articles/StrikeAMatch.html seems to be much more logically correct than Levenstein distance, and it is 3-5 times faster (isolated tests).

      Big list od distance implementations:
      http://www.dcs.shef.ac.uk/~sam/stringmetrics.htm

      Attachments

        1. FuzzyTermEnumNEW.java
          4 kB
          Fuad Efendi
        2. FuzzyTermEnumNEW.java
          4 kB
          Fuad Efendi
        3. DistanceImpl.java
          4 kB
          Fuad Efendi
        4. Distance.java
          0.1 kB
          Fuad Efendi
        5. BKTree.java
          3 kB
          Fuad Efendi

        Activity

          People

            Unassigned Unassigned
            funtick Fuad Efendi

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 1m
                1m
                Remaining:
                Remaining Estimate - 1m
                1m
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment