Description
Today, if you ask FuzzyQuery to match abcd with edit distance 2, it will fail to match the term ab even though it's 2 edits away.
Its javadocs explain this:
* <p>NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled * distance between two terms is computed. For a term to match, the edit distance between * the terms must be less than the minimum length term (either the input term, or * the candidate term). For example, FuzzyQuery on term "abcd" with maxEdits=2 will * not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2 will not * match an indexed term "abc".
On the one hand, I can see that this behavior is sort of justified in that 50% of the characters are different and so this is a very "weak" match, but on the other hand, it's quite unexpected since edit distance is such an exact measure so the terms should have matched.
It seems like the behavior is caused by internal implementation details about how the relative (floating point) score is computed. I think we should fix it, so that edit distance 2 does in fact match all terms with edit distance <= 2.
Attachments
Attachments
Issue Links
- is related to
-
LUCENE-5206 FuzzyQuery: matching terms must be longer than maxEdits
- Patch Available