-
Type:
New Feature
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: Impala 3.3.0
-
Component/s: None
-
Labels:
-
Epic Color:ghx-label-6
References:
- [Apache commons - JaroWinklerDistance |https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/similarity/JaroWinklerDistance.html]
- [Apache commons - JaroWinklerSimilarity |https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/similarity/JaroWinklerSimilarity.html]
- [Oracle - JARO_WINKLER[_SIMILARITY]|https://oracle-base.com/articles/11g/utl_match-string-matching-in-oracle]
Notable difference:
- With similarity, the Oracle version returns a normalized result ranging from 0 to 100.
- In the Appache version, null values result in exceptions.
- Apache rounds the values to two digitsĀ
The scaling factor of the algorithm can be added as an extra/default argument.