|
[
Permlink
| « Hide
]
Cédrik LIME added a comment - 21/Feb/08 11:22 AM
new TRStringDistance implementation, as patch and as complete source file.
It occurs to me that we apparently have two different implementations of Levenshtein, one in spellchecker and one for FuzzyQuery. I haven't analyzed them individually to know for sure, but if this is a much better implementation, then we should think about using it for FuzzyQuery, too.
The FuzzyQuery (FuzzyTermEnum) version claims to have a fast-fail mechanism, too:
Cedrik, since you seem to know about these things, would you have time to look at FuzzyTermEnum? A 3x speedup there would be great for users, too. You caught me while I was finalizing a patch for FuzzyTermEnum...
Also see
Well spotted Karl! My version is very similar to
Can someone link those 2 issues together in the meantime? (There are too many options in the drop-down; don't know which one to choose.) New patch for FuzzyTermEnum, incorporating most of
Committed the TRStringDistance patch – thank you!
Committed revision 659016. I'll leave the FuzzyTermEnum patch for a later date. Is there anything in Bob's FuzzyTermEnum that is not in this patch? Anything that you'd want to add, Cédrik? All of Bob's FuzzyTermEnum patch is in my patch. I only left some smallish optimizations that didn't bring much but did hurt code readability. In other words, should you commit my patch, you will have most of (99.9%)
I think this is an important patch for Lucene 2.4, as it brings vast performance improvements in fuzzy search (no hard numbers, sorry). Any news on the landing of this patch?
Now that Lucene 2.9 is out, the vastly better memory usage and speed-up would be a welcome addition to Lucene 3.0's fuzzy search! Cédrik, could you update the patch to trunk? It sounds like a compelling improvement. We should get it in.
Thanks Michael.
FuzzyTermEnum.java has not changed for more than 2 years. The uploaded patch (FuzzyTermEnum.patch) is still valid for trunk. OK I had 2 hunks fail but I managed to apply them.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||