Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.12.0
-
None
-
None
Description
The FST linking engine already allows to configure in percentage how much of a processable chunk (typically noun phrases) need to match so that a suggestion is accepted. This is done by using the "enhancer.engines.linking.minChunkMatchScore" property. The default is > 50%.
While this way of configuration is great for chunks created by NamedEntityAnnotations it is not always well suited for detected noun phrases as those may select larger sections of a sentence. E.g. "goalie Mathias Lange (Iserlohn Roosters)" will not match any Entity in a vocabulary as it contains 5 matchable tokens but both the player "Mathias Lange" and the Team name "Iserlohn Roosters" do only represent two of them.
In such cases the configuration of a fixed lower limit of the number of (matchable) Tokens that need to match within a Chunk can be preferable.
For this configuration the FST linking engine will use the "Min Matched Tokens (enhancer.engines.linking.minFoundTokens)" property of the EntityLinker configuration. The default will be "2".
The FST linking Engine will accept tokens the either confirm with "enhancer.engines.linking.minChunkMatchScore" or "enhancer.engines.linking.minFoundTokens".
NOTE: those configuration do only apply for Tokens within a processable Chunk (typically a Noun Phrase)