Affects Version/s: None
Fix Version/s: None
Local optimization of the current edit-distance handling code in AsterixDB: Can we terminate the edit distance calculation based on a given edit-distance threshold T? I think the answer is yes.
For two strings X and Y,
Construct D(len(X)+1=M, len(Y)+1=N).
D(i,0) = i
D(j,0) = j
D(M,N) is the edit distance value.
Early Termination condition:
So, for the given threshold T, we can stop the computation early if the values of three cases are greater than T. That is,
min (D(i-1,j), D(i,j-1), D(i-1,j-1)) > T
This holds since if the cost of all possible cases (insertion, deletion, and substitution) is greater than T, all future operations will be greater than T in any cases.